Skip to content
Lean-Data Lean-Data

Menu

  • Architecture
    Design your foundation
  • Asset
    Build your value
  • Quality
    Improve & optimize
  • Migration
    Realize the value
  • Solutions
    Save time & money
    • LeanData Deviation Discovery
    • LeanData Extraction
  • Contact
    Background & contact

Here is how to start with data quality

On December 7, 2018 December 7, 2018

Why is data quality relevant

Did you know:

  • 25% of critical data in large companies is incorrect or incomplete? (Dataflux/SAS)
  • 2-3% of customer data becomes inaccurate after aging 1 month? (Butlergroup)
  • 20% of turnover is lost due to cost of rework in operational processes and corrections in information reporting? (Gartner)

With years of data analysis experience and now as and enterprise data architect, I found that building a case for data quality management is challenging. When I speak with a data owner, he is convinced his data is in perfect shape. And he is right to be proud of his work. He is however unable to prove the quality of his data due to missing processes and tools that facilitate him in his needs.

When I finally get my hands on an extract and show him the data profile he immediately starts making corrections. And this is how to get started with data quality:

Start with the data

 

What is data quality

Data needs to meet the process quality requirements to make them operate efficiently. Then there are many exception flows in the business processes that go around the usual happy flows where data is created. This results in gaps in the data quality further down the process chain.

Monitoring these data quality gaps is essential for stable processes.

So first we need to understand what data quality is. Lean Data defines data quality as:

The degree to which data meets the requirements of the processes it is used in

Then the data management body of knowledge (DMBOK) gives us a hand and defines 6 dimensions that represent data quality. So this sounds easy and you may feel you can start right away. You’ll quickly run into the follow-up question how to prioritize which datasets and attributes are important. This is answered by focusing on those deviations in your data that affect your organization. Lean Data uses 4 categories to determine whether a deviation in one of the 6 dimensions is worth to spend time and effort on:

When a deviation in your data affects one of these impact categories you should make this a priority in documenting the business rules. For the business rules you can try to define data quality rules and implement those as mitigations. These mitigations should be implemented at the source where data is created. This should enable you to prevent these deviations affecting your organization in the future.

How to maintain data quality

There are many software vendors that promise you the best data quality analysis & reporting solutions. At a notable price of course.

The real cost is the time and effort it takes to identify & define the rules that need to applied on the data to meet these requirements. This requires close interaction between the process expert and the data analyst. This is a lengthy process before data quality measures can be built into the data quality reporting environment.

What you need first is an easy to use tool 

When you purchase one of the top magic quadrant solutions you’ll notice the difficulty to get it set up, connect a dataset and then the technical skills required to process the data. Furthermore, due to IT restrictions you’re not allowed to just install any new software. Here’s a screenshot with a question on usability of one of the major data quality solutions:

In my experience this kind of complexity and limitations has resulted in the data owners not using the data quality tooling. That is why Lean Data divides data quality management in 2 phases:

  1. Setup through defining and designing data quality measurements
  2. Embedding through running and maintaining data quality measurements

Although the major vendors have great solutions for the second phase, the first phase is pre-conditional to build the foundation of data quality management. If you loose support there, you won’t get data quality processes embedded.

So which data should you begin with?

To start, you can collect important data objects like your customers, suppliers, products from the source and start profiling the attributes. Obtain the ‘raw’ data from the source where this data is created to ensure no adjustments were made in the ETL which affects the validity of your analysis. You will quickly notice with above dimensions and categories that you’ll be able to define business rules.

Need a hand? Get in touch!

Display publications by tag

Business Metadata Business Rules Data Asset data catalog data inventory Data lake data market place data preparation Data Quality Data Quality Rules data search data service Data Value enterprise data catalog intro Metadata metadata management Operational Metadata Social Metadata Technical Metadata user experience user journey

Read all publications

Other recent publications

  • 3 points of attention that are underestimated for successful data catalogs
    3 points of attention that are underestimated for successful data catalogs
    December 15, 2020
    The user experience of data catalogs is more than just a fancy screen. The importance of the functionality that optimizes the user journey is underestimated and is critical for rolling out an enterprise data search capability. […]
  • The data marketplace enables the success of your data lake(s)
    The data marketplace enables the success of your data lake(s)
    January 24, 2020
    Most time spent by data analysts and scientists is on finding, understanding and preparing data. The success of a data lake can be improved if supply and demand for data is well organised. The key is to start with an enterprise data catalog as the data market place. […]
  • 5 steps to begin collecting the value of your data
    5 steps to begin collecting the value of your data
    August 6, 2019
    The value of data can be measured by its actuality and use by consumers. Make your data findable, understandable, controllable, traceable and trusted to collect its value. […]
  • Now available: experienced Enterprise Data Architect
    Now available: experienced Enterprise Data Architect
    January 1, 2019
    Hugo has 12 years hands-on experience in data management, business intelligence and analytics. Need a refreshing light on things? […]
  • lean-data.nl is live
    lean-data.nl is live
    August 2, 2018
    Lean Data is live! Follow for news and solutions to build your data driven organization. Design, build, improve and realize the value from your data assets. […]

| © 2021 Lean-Data | KvK 71359338 | BTW NL001912597B29 |

Top
This site only uses cookies for traffic analysis & content improvement: Find out more here