Data quality: Defining an achievable standard
The 'top down' approach to quality
In the this discussion the development of the approach to Data Quality follows a 'top down' sequence which
reflects the sequence of events which need to be followed by the project itself.
- Developing a Data Quality Strategy
- Setting the Data Quality Standard
- Defining Data Quality
- Measuring Data Quality
- Definitions and examples
- Application of the Standard
- Supporting and Enhancing Data Quality
Developing a Data Quality Strategy
The Data Quality Strategy is the starting point for all work in connection with Data Quality.
The strategy is important because:
- It provides the first outward and visible sign of the approach to be adopted
- It is an important input to the design stage
- It ensures that the business has a vehicle within which the issues surrounding data quality (or lack of it) can be aired, and decisions made
- It has an impact on the overall cost benefit analysis
- It is the basis for all further work in relation to Data Quality.
As such it must be developed and agreed at the earliest possible stage in the project. An effective strategy will
reflect the needs of the business while remaining pragmatic and achievable.
Purpose
The purpose of the Data Quality Strategy is to provide the framework in which the business requirements for
data quality can be realised in the most cost effective manner. The strategy must provide the basis for data
quality in all areas of the project and at all stages; that is through the full project life cycle:
- System functional design (e.g. to enable quality related meta data to be acted upon)
- Data design (e.g. to allow quality related meta data to be held and accessed)
- Data collection design (to ensure that the needs for a measured quality of data are designed into the collection procedures and tools)
- Data conversion design (to ensure that converted data meets the quality hurdle)
- Data maintenance (to ensure that data maintenance procedures address the need to maintain and improve data quality).
In addition, the quality of data which transfers between systems must be considered. Few systems exist in
isolation. The interface to other systems (electronic or clerical) poses various issues for Data Quality. The
strategy must address the allocation of responsibility between communicating systems.
Principles
When developing a strategy, it is necessary to establish an agreed set of principles that underpin the strategy.
Although some of the principles may be considered to be truisms, they will in themselves raise questions which
it is necessary to address. In this respect they aid clarity of thought. Examples of such principles might be:
- It is preferable to have no data than unreliable data
- It is uneconomic to check all data before use
- All data checks should be against site or other primary source
- All potential sources should be quality checked before selection
- The best source of data is the equipment itself - i.e. site
- All data cleansing should take place external to the target system.
Such principles may have to be violated under specific and difficult circumstances. However, the existence of
the principles means that the issues will get properly aired and debated. Formal Change Control should then
come into play when it is found necessary to alter these fundamental assumptions.
Scope and Objectives
It will be easier to develop and agree the Data Quality Strategy once its scope and objectives have been
established clearly. Examples of such objectives include:
- The progressive improvement of Data Quality during system life
- Flexibility to meet future needs (e.g. anticipated business need for higher quality of some classes of data)
- Provision of quantified benchmarks
- Achieving consistency of data quality for data from variable quality sources
- Basis for supply of data from different external suppliers (e.g. basis of contract with conversion vendors or collectors)
- Assessing the marginal cost of changes in data quality
- Establishing meaningful Corrective Action Procedures.