Update: metadata overview simplified
Herewith an update to my earlier post where I introduced subcategories to different types of metadata. In the many discussions around becoming data driven, I have noticed that this overview helps focus and makes requirements identification easier.
All data starts with business metadata. This is the information we need to actually build a dataset. There is someone in the business who approved the collection and processing of data in the first place. He/she also provides requirements an descriptions on what he needs. The challenge is that this information is often not maintained throughout time which leads business metadata quality to decrease.
When we actually know what the business wants, we can design and implement this into physical form through technical metadata. We can now build the actual application or buy it of the shelf and map it to the business metadata.
Now that we know what data we need, what it means and have a place to store and process data; we can start doing business. This will generate operational metadata. This type of metadata is very valuable in monitoring our data processes. We get insights in what data is processed, how often, the speed and frequency. This is great input in analysing the performance of our IT landscape and see where improvements can be made. Further we monitor the access to systems and data. When we take it a step further we can even start analysing patterns and possibly spot odd behaviour as signals of threats to our data.
Finally we can also take the social metadata as an inspiration. And this is where the actual value of your data becomes tangible. If value is determined as the benefit the user thinks he gains, the way that he uses the data is an indicator of value. Thus if we start measuring what data is used often by many users, this data must be important and valuable. So let’s invest in improving the quality of this data to improve the value created. Behaviour is also a good indicator to measure. How much time is spent on content and which content is skipped quickly. Apparently that content doesn’t match up with what the user is looking for.
Business metadata |
Governance metadata
All metadata required to correctly control the data like retention, purpose, classifications and responsibilities.
– Data ownership & responsibilities
– Data retention
– Data sensitivity classifications
– Purpose limitations
Descriptive metadata
All metadata that helps understand and use and find the data.
– Business terms, data descriptions, definitions and business tags
– Data quality and descriptions of (incidental) events to the data
– Business data models & bus. lineage
Administrative metadata
All metadata that allows for tracking authorisations on data.
– Metadata versioning & creation
– Access requests, approval & permissions
Technical metadata |
Structural metadata
All metadata that relates to the structure of the data itself required to properly process it.
– Data types
– Schemas
– Data Models
– Design lineage
Preservation metadata
All metadata that is required for assurance of the storage & integrity of the data.
– Data storage characteristics
– Technical environment
Connectivity metadata
All metadata that is necessary for exchanging data like API’s and Topics.
– Configurations & system names
– Data scheduling
Operational metadata |
Execution metadata
All metadata generated and captured in execution of data processes.
– Data process statistics (record counts, start & end times, error logs, functions applied)
– Runtime lineage & ETL/ actions on data
Monitoring metadata
All metadata that keeps track of the data processing performance & reliability.
– Data processing runtime, performance & exceptions
– storage usage
Controlling (logging) metadata
All metadata required for security monitoring & proof of operational compliance.
– Data access & frequency, audit logs
– Irregular access patterns
Social metadata |
User metadata
All metadata generated by users of data to
– User provided content
– User tags (groups)
– Ratings & reviews
Behavior metadata
All metadata that can be derived from observation to
– Time content viewed
– Number of users/ views/ likes/ shares