Skip to content
Lean-Data Lean-Data

Menu

  • Architecture
    Design your foundation
  • Asset
    Build your value
  • Quality
    Improve & optimize
  • Migration
    Realize the value
  • Solutions
    Save time & money
    • LeanData Deviation Discovery
    • LeanData Extraction
  • Contact
    Background & contact

Metadata Tag

5 steps to begin collecting the value of your data

5 steps to begin collecting the value of your data

It’s not new that data is called ‘the new oil’. Organisations that are able to collect, organise and combine data effectively are in a good position to start creating new value. The question is when you can collect the value of data and how to get there.


The worlds most valuable resource is no longer oil but data

The Economist

If you are seeking to leverage the value data for your business, you’ll need to start managing your metadata. Metadata is the key enabler to help you optimise your processes, gain and stay in control of risks and enable new value creation. This blog gives a quick introduction into why you need metadata and where to start.

The value of your data is unlocked through metadata

Enable discovery and sharing of data shortening search times
Protect your investment in data due to staff turnover and enable reuse
Improve understanding and decision making with high quality data
Mitigate your risks and limit liability
Control what your data is used for and where it goes
Increase effectiveness & efficiency in collaboration
Reduce costs and create new value with faster development & innovation

These examples of value all sound nice, but aren’t easily achieved. Therefore we’ll go into more detail on what we need to organise to build the foundation of this value.


5 steps for managing your metadata

Working with large organizations with complex IT landscapes and data exchange, we’ve found there are couple of generic steps to take. When achieving this in a sustainable manner, you’ll be able to collect value from your data more rapidly:

1. Find the data your organisation needs
2. Understand that data
3. Know who is responsible for that data
4. Able to follow to source and end users of that data
5. Trust the data so they can use it without hesitation

Metadata provides insights
Metadata is key enabler of data value

So here is a way to approach these steps in practice:

1. Make data findable
As a start you’ll need to set priorities. Determine which data sources are part of your core operational & information processes. Focus on on indexing these first. This can be on a high level and doesn’t necessarily mean you need to index all attributes of all data objects. Just start by adding descriptions for data sets and add business tags to make them more easy to find. Further, people looking for data need to know they have a complete view, or at least know through which sources they are browsing.
2. Add descriptions & definitions
Next step is that they need to understand what they are looking at. For the most important data objects and key attributes you’ll need to add descriptions.
3. Make transparent who to contact
When people have found data that they think they can use, they’ll first need to get in touch with the owner to get access to the data. Governance metadata is therefore an essential part in creating value out of data.
4. Document data logistics to gain control
When people start using data, they’ll become dependent on changes in the data production flows. Changes need to be managed on their impact and thus it is necessary to follow where data comes from and who is consuming it.
5. Analyse data quality to make data usable
Finally this all adds up to trusting the data. Data quality measurements will allow people to assess whether they can use this data for their processes.

What metadata to collect

Now to make this work, we’ll need to collect metadata that help each of these steps. It will help discussion when you can categorize different metadata elements into groups. Lean Data suggests following 4 categories:

  • Technical Metadata
  • Business Metadata
  • Operational Metadata
  • Social Metadata

The technical metadata is the foundation. You’ll need to gather and organise this before you can make sense of it (see below for examples). The next step is adding the descriptions and governance metadata as business metadata for understanding and control. Now you can go to work and improve the data processing by capturing and monitoring the operational metadata. Now we have reached the stage where we can really start building value from data. Up to this point we have only enabled our ability to monitor and control our day to day business operations. The real value of data can be measured and improved when you start collecting the social metadata. Social metadata will tell you the actual use. Taking control of that data will help you developing your data as a business asset.


The real value of data can be measured through the use by its consumers

Lean Data

The actual use of data is a clear indication of whether the data is valuable. Measuring this social metadata helps you put focus on this data. Furthermore other valuable data sets that are not effectively used can be given attention to improve its use.

Metadata types within the 4 metadata categories

To get started, below is an overview of metadata types with examples that you could start organising. Start with a (data) process analysis and use your business objectives to determine your metadata needs:

Technical Metadata types

Connectivity metadata
– Source application name, location
Technical metadata
– Technical table & field name
– Data format (e.g. text, SPSS, Stata, Excel, tiff, mpeg, 3D, Java, FITS, CIF)- Compression or encoding algorithms
– Encryption and decryption keys
– Software (including release number) used to create or update the data
– Hardware on which the data were created
– Operating systems in which the data were created
– Application software in which the data were created
Structural metadata
– File relationships (e.g. child, parent & datasetgrouping)
Preservation metadata
– File format (e.g. .txt, .pdf, .doc, .rtf, .xls, .xml, .spv, .jpg, .fits)
– Significant properties
– Technical environment
– Fixity information

Business Metadata types

Business initiative metadata
– Business case (reference, contacts)
– Request purpose
Governance metadata
– Owner of the data
– Data purpose limitations
– Business rules & data retention
– Data classification (AIC & Privacy)
Descriptive metadata
– Name of creator of data set
– Name of author of the data
– Title of document/ data
– Data (as)set name & description
– Object name, description & definition
– Attribute functional name, definition & description
– Location of data
– Size of data
Administrative metadata
– Information about data creation
– Information about subsequent updates, transformation, versioning, summarization
– Descriptions of migration and replication
– Information about other events that have affected the files
– Access rights metadata

Operational Metadata types

Execution metadata
– Whether the process run failed or had warnings
– Which database tables or files were read from, written to, or referenced
– How many rows were read, written to, or referenced
– When the process started and finished
– Which stages and links were used
– The application that executed the process
– Any runtime parameters that were used by the process
– The events that occurred during the run of the process, including the number of rows written and read on the links of the process.
– The invocation ID of the job
– Any notes about running the process
Monitoring metadata
– Actual status of a data processing job (in progress, error, paused)
– Current runtime & estimated end time
– Completeness flag & percentage
– Data Disposal verification

Social Metadata types

Use metadata
– Circulation records
– Physical and digital exhibition records
– Content reuse and multiversioning information
– Search logs & parameters
– Data search results, filters and clicks
– Use and user tracking
– Data tags
– Excerpt / summary
– URL
– Number of users & viewing time
– User review & ranking of data
Controlling metadata
– Data access users & time
– Frequency of data access
– Time between data access attempts

Thank you for reading! If you would like support for your organisation, feel free to get in touch!

In another blog I’ll address what solutions and capabilities you’ll need in your architecture.

Here is how to start with data quality

Here is how to start with data quality

Why is data quality relevant

Did you know:

  • 25% of critical data in large companies is incorrect or incomplete? (Dataflux/SAS)
  • 2-3% of customer data becomes inaccurate after aging 1 month? (Butlergroup)
  • 20% of turnover is lost due to cost of rework in operational processes and corrections in information reporting? (Gartner)

With years of data analysis experience and now as and enterprise data architect, I found that building a case for data quality management is challenging. When I speak with a data owner, he is convinced his data is in perfect shape. And he is right to be proud of his work. He is however unable to prove the quality of his data due to missing processes and tools that facilitate him in his needs.

When I finally get my hands on an extract and show him the data profile he immediately starts making corrections. And this is how to get started with data quality:

Start with the data

 

What is data quality

Data needs to meet the process quality requirements to make them operate efficiently. Then there are many exception flows in the business processes that go around the usual happy flows where data is created. This results in gaps in the data quality further down the process chain.

Monitoring these data quality gaps is essential for stable processes.

So first we need to understand what data quality is. Lean Data defines data quality as:

The degree to which data meets the requirements of the processes it is used in

Then the data management body of knowledge (DMBOK) gives us a hand and defines 6 dimensions that represent data quality. So this sounds easy and you may feel you can start right away. You’ll quickly run into the follow-up question how to prioritize which datasets and attributes are important. This is answered by focusing on those deviations in your data that affect your organization. Lean Data uses 4 categories to determine whether a deviation in one of the 6 dimensions is worth to spend time and effort on:

When a deviation in your data affects one of these impact categories you should make this a priority in documenting the business rules. For the business rules you can try to define data quality rules and implement those as mitigations. These mitigations should be implemented at the source where data is created. This should enable you to prevent these deviations affecting your organization in the future.

How to maintain data quality

There are many software vendors that promise you the best data quality analysis & reporting solutions. At a notable price of course.

The real cost is the time and effort it takes to identify & define the rules that need to applied on the data to meet these requirements. This requires close interaction between the process expert and the data analyst. This is a lengthy process before data quality measures can be built into the data quality reporting environment.

What you need first is an easy to use tool 

When you purchase one of the top magic quadrant solutions you’ll notice the difficulty to get it set up, connect a dataset and then the technical skills required to process the data. Furthermore, due to IT restrictions you’re not allowed to just install any new software. Here’s a screenshot with a question on usability of one of the major data quality solutions:

In my experience this kind of complexity and limitations has resulted in the data owners not using the data quality tooling. That is why Lean Data divides data quality management in 2 phases:

  1. Setup through defining and designing data quality measurements
  2. Embedding through running and maintaining data quality measurements

Although the major vendors have great solutions for the second phase, the first phase is pre-conditional to build the foundation of data quality management. If you loose support there, you won’t get data quality processes embedded.

So which data should you begin with?

To start, you can collect important data objects like your customers, suppliers, products from the source and start profiling the attributes. Obtain the ‘raw’ data from the source where this data is created to ensure no adjustments were made in the ETL which affects the validity of your analysis. You will quickly notice with above dimensions and categories that you’ll be able to define business rules.

Need a hand? Get in touch!

| © 2023 Lean-Data | KvK 71359338 | BTW NL001912597B29 |

Top