Data quality: Defining an achievable standard
Tony S. Holmwood
Brown & Root Services, Halliburton Brown & Root
Hill Park Court, Leatherhead, Surrey KT22 7NL
United Kingdom
Background
'Data is the Cinderella of system implementation' - Why so ?
It is generally accepted that 50 - 85% of the cost of building and implementing an asset management/GIS system
lies in the provision of data. Yet population of a new system's database is seldom considered until late in the
day. Data collection and data conversion are not so interesting as hardware and software. The requirement for
data is seldom thought about until the project has been under way for some time and the major decisions on
platform and functionality have been made. The inter-related issue of Data Quality is often ignored completely.
However, the issue of Data Quality is taking on increasing importance. As the performance of large public
service companies and authorities gains a higher public profile, so management is forced to implement better
systems with which to manage their infrastructure. The demand by the regulatory authorities that the major utility
companies achieve a greater efficiencies and better safety is forcing an increased understanding of the assets
which make up the backbone of their business. This in turn is placing ever greater emphasis on their asset
management and GIS systems. In the United Kingdom the issue of public and employee safety has become a
major subject of concern and debate.
Both the regulatory authorities and company managers at board level are increasingly aware that safety can only
be delivered where information about the maintenance, performance and condition of the assets is accurately
recorded and readily available. Therefore the issue of Data Quality is becoming increasingly important.
Purpose
It is a business requirement to have the GIS database populated with data of a known and agreed quality.
The purpose of this paper is to raise the profile of Data Quality, and in so doing to consider some of the
questions which arise when the subject is considered in any depth. The basic proposition that the paper addresses
is stated above.
The paper sets out to describe an approach to Data Quality which has been developed within the United
Kingdom Utilities, Telecommunications and Rail industries. However, elements of the approach can be applied
to any situation where it is necessary to have a quantified basis for data quality. In particular the primary role of
the Data Quality Strategy at the early stages of a project is emphasised.
The ideas put forward here are not intended to be prescriptive. Each business and project needs to address the
issue of data quality in the light of its own circumstances, business objectives and constraints. However, it is
hoped that elements of the approach discussed in this paper can be modified and expanded to meet individual
needs. An important objective of this paper is to promote reaction and discussion.
Overview
The need for Data Quality impinges on every stage of the development and implementation lifecycle
Data Quality is considered here in the context of the standard development project life cycle. To achieve a
known and required quality of data after the event - that is after completion of system development and
implementation - is generally difficult, often impossible. It is important therefore that the matter of Data Quality
is considered at the earliest stages of design - where design includes:
- system function
- database structure
- collection methods and tools
- data migration and conversion
- data maintenance.
The need for data quality impacts the design activity in all these areas.
The individual business perspective
It is important to emphasise that there are no absolutes when it comes to Data Quality. The impact of legacy
systems (electronic or paper) on the data collection and conversion activity can be huge. Inevitably there is a cost
associated with quality, and therefore it is important that trade off between quality and cost is understood by the
business. This understanding can only be achieved where a framework exists in which data quality can be
properly considered and acted on. The starting point for this is the development of a Data Quality Strategy which
can be agreed by the business. The activity of producing this forces the business to consider the importance of
individual elements of data and the consequences (e.g. costs) of inaccuracies.
The approach to Data Quality
It is suggested that no GIS project is meeting its corporate responsibility unless the following questions have
been successfully addressed:
- Which data is most important ?
- What is the relative importance of different data ?
- How will Data Quality be measured ?
- At what stages will Data Quality be measured ?
- How will we know if the important data is good enough ?
- What will be done when data fails to meet the set standard ?
This paper outlines an approach which allows such questions to be considered rationally, and then answered in
the light of the business needs.