Data integrity and quality - How do you get there?
Bob Britton
US WEST, 700 W. Mineral Ave. , Littleton, CO 80120
Darrell Rhodes
Analytical Surveys, Inc., 941 N. Meridian St.
Indianapolis, IN 46204
Quality Control And Quality Assurance (Qa/Qc) Processes
Data integrity and quality are critical to the success of any geospatial program implementation and are achieved
through effective QA/QC and data validation processes, especially during the data conversion phase of the project.
The importance of these processes become more evident as the project stakeholders recognize that the initially
converted database will be used as the foundation for all future business applications and functions requiring
geospatial data and analysis. Typically, additional layers or levels of data will be added on top of the originally
converted data as future enhancements are made to the geospatial system, adding more importance to the initial
quality and integrity of the primary system data.
Typically, conversion vendors will utilize detailed quality assurance and quality control steps within their
conversion process to ensure the data specifications are met prior to delivering the data to the client. With the
increasing focus on ISO 9000 implementation and certification, quality assurance and control procedures are
beginning to adhere to a basic set of standards recognized throughout the GIS community as well as other industries,
resulting in improved QA/QC processes.
During data conversion, quality control is focused on "inspection" and will include full manual and automated
checks of the converted data against the source information and specifications at defined checkpoints within the
conversion process. Manual checks may include a one to one comparison of the source document with a hard copy
plot as well as performing consistency checks digitally on screen. Automated checks typically involve very specific
validation routines that are run at the completion of the tasks. These would check feature level and model level
requirements and would report them to an operator for correction. This process is generally cyclical until all the
errors have been identified and corrected before moving to the next task. Quality assurance is focused on the
"process" as well as "validation". The conversion process must be engineered to ensure the quality of the data is
"built in" rather than "inspected in". This involves various types of validations built into the conversion software
and process to minimize the "human error" factor on the data.
For example, data model requirements can be
incorporated into the conversion software, enabling the data being entered by an operator to be validated "on the
fly". In other words, if an operator is capturing attributes for a cable feature, the software would only allow the
operator to key in legal values for the attributes as defined by the data model. For system defined attributes, the
software would populate these fields automatically and would eliminate any operator intervention thereby reducing
the risk of error. In addition to process engineering, quality assurance also involves data validation utilizing random
sampling techniques at major points within the conversion process. This generally occurs on the final software
platform and will consist of running QA scripts and checking reports as well as reviewing hard copy check plots and
performing onscreen integrity checks. Generally, better results are achieved if the conversion vendor can replicate
the random sampling process / technique utilized by the client.
The client must fully understand the conversion vendors QA/QC processes in order to establish their own checks
and be assured that the data will comply with the specifications and meet the desired quality levels. In this regard,
the client and the conversion vendor must work together and share in the QA/QC responsibility. Providing the
conversion vendor with concise requirements and targets is the first step in setting up good quality methods. This is
generally accomplished through the development of acceptance criteria, which is discussed in the last section of this
paper.
Most clients typically try to minimize the resources required to perform the QA/QC analysis of the data delivered by
the conversion vendor. In addition, knowledgeable resources are not always available. This is compounded by the
fact that when the resources are available they are needed for other programs within the company. In addition to
resource issues, there are also time constraints that must be managed as part of the data review. Generally, there are
specific time periods that have been defined in the contract for data review and acceptance. With this in mind, the
client usually uses a statistical sampling scenario to maximize the resource and time constraints to validate data
coming from the vendors. It is also important to note that the client is counting on much of the actual detailed
quality control to have taken place before the data is delivered.
To establish a proper statistical sampling method, the client must adhere to an established set of criteria such as
presented in the ANSI Sampling Procedures and Tables for Inspection by Attributes. It is critical that once the
inspection criteria are established, the staff assigned to perform the inspection follow the criteria to the letter. A
common failure of this type of analysis is convincing the users that it works. The people who are performing the
inspection are usually ex-records staff who will not be tolerant of ignoring "other errors" that they notice outside of
the sample set of data. For random sampling to work however, you must record errors only on the items randomly
selected for sample.
One method to help mitigate this natural resistance to look past additional errors is to utilize a separate error tally
sheet to record the "other errors" found during the inspection. The project team will then need to decide whether or
not action is required to specifically correct these errors before the data is turned over to the end user. If the data
were accepted based on the sample, typically the "other errors" would not be corrected before the data is turned over
to the client. If the data were rejected based on the sample, typically the "other errors" are returned to the
conversion vendor for correction as part of their normal rework cycle to correct the rejected delivery.
A typical random sampling process will include four components, which are, inspection conditions, characteristics
to be inspected, inspection methods, and the consolidation of results. With these components in place, one should
be able to ensure their converted data is complete and accurate. Again, to establish proper statistical sampling
processes, one should follow accepted standards such as ANSI's Guide to Inspection Planning, and Sampling
Procedures and Tables for Inspection of Attributes.