Application integration in data conversion quality Measurement
Robert G. Stengle, P. E.
Stoner Associates, Inc., 1170 Harrisburg Pike
Carlisle, PA 17013
Overview
Implementation of a Geographic Information System involves a complex mixture of computer
hardware specification and procurement, software selection, software development, and data
conversion. As with any complex system implementation, quality assurance plans are essential
in providing a means to gauge the progress of the project in meeting its objectives. An effective
quality assurance program provides timely identification of design or process deficiencies.
Creation of new data and conversion and loading of existing enterprise data into the GIS is an
extensive and costly segment of the GIS setup. The source data used to populate the GIS will
have varying degrees of currency, completeness, attribute accuracy, and spatial accuracy. In
some cases, the "pedigree" of the source data will be known, while others may be difficult to
assess. It is not unusual to have different accuracies within one source data set. As an example,
facilities mapped before a certain date or in a certain area may be more suspect than other data.
User confidence in the GIS data is a key aspect of responding to customer, contractor, and other
inquiries or to conduct network analysis. For example: in responding to an excavation plan, the
GIS user needs to understand the completeness of the data as well as the accuracy of its spatial
presentation. Depending on the data characteristics, the contractor's request may or may not
warrant a field check by the utility.
This presentation will explore ways to establish the all-important "data about data" (metadata)
characteristics of a GIS.
Approaches to Quantifying Data Accuracy
There are a number of factors that may affect the accuracy of the source data for the GIS,
including:
- Source maps which have inadequate topographical references
- Inaccuracies in the landbase of the source data
- Inaccuracies in the landbase used in the GIS
- Inaccuracies in the field sketches for as-built facilities
- Distortions while incorporating the field sketches onto the master drawing
- Facility changes that are not incorporated in the master drawing
- Attributes which are missing from the source maps, such as material type
- Errors made in recording or transferring attribute data
- Shrink, stretch, or other distortion of the source map
- Source maps that are of a smaller scale than the normal scale used by the GIS operators and map users
- Data blurring during reproduction, due to a defective master map or a process problem that creates an erroneous pipe break or improper connection in the copy
- Unavailability or improper interpretation of detail drawings, causing a congested area to be misinterpreted during data capture
- Lack of data on the interior conditions of the equipment (e.g. valve trim, pipe interior diameter and/or wall coating thickness)
- Data sets which come from existing systems wherein some of the information is not validated and/or maintained at all or to the accuracy level needed by the target GIS application(s)
Most, if not all, utilities must deal with one or more of the above data issues. The research
and/or field checking effort to fully validate the accuracy of a GIS data set would be prohibitive
for the vast majority of utilities and other distribution companies. Validation of transmission
facilities is more practical, but would still involve considerable time and expense.
Facilities verification can be carried out through field checks, record checks and/or indirect
checks. For field checking of buried facilities, magnetic sensors or other special equipment may
be used to locate the route of the facilities. Some excavation may also be required. Record
checks involve reviewing more accurate records such as individual as-built drawings or service
cards. Indirect checking involves measuring parameters in the system and comparing them with
those reported by a model built from the GIS data. Typically, the parameters involved in indirect
checking are pressures and flows in fluid systems and power, current, and voltage in electrical
systems.
Field and record checks generally involve statistical sampling and extrapolation of the sample
results to geographical or equipment groups. For some facilities attributes, such as the spatial
accuracy of buried equipment, this is the only practical method of data verification. While direct
checks are costly, there is no acceptable substitute for validating certain types of data. However,
indirect checking, either alone or coordinated with field and record checks, can provide the
enterprise with improved data quality at a lower cost than would be the case if only direct checks
were conducted.
We noted above that spatial attributes are best validated through field and record checks. An
example where indirect checking would apply is valve status. A combination of field and
indirect checking may be the most effective approach for checking valves. Consider a situation
where a statistical field survey of a gas system results in identifying one percent of the noncritical
("convenience") valves as being found either partly or fully shut. If there are 20,000
valves in the system, it would be a major undertaking to locate and manually check every valve.
A viable alternative would be to create a hydraulic model of the system and then investigate
areas where there are unexplained differences between the model results and the field pressures.
The model results could then be used to direct the field investigation in specific sections of the
system. An out-of-position valve could cause a disparity between the model and the field, so
valve position checks in the vicinity of pressure/flow discrepancies would be included in the
field investigation.
For some required data, indirect checking is the only feasible way to estimate or validate
facilities attributes. The best example of this is determining the interior conditions of water
distribution piping. By running a model of the water system developed from the GIS data, the
field versus model comparison can be used to assign pipe interior condition parameters (Hazen-
Williams C factor, Darcy-Weisbach friction factor, effective inside diameter, and/or pipe
efficiency) that result in the best match between the recorded pressures and flows and the model
results. The results of the field and hydraulic model comparison may dictate the need for some
field testing to measure individual pipe capacities for critical backbone lines.
Figure 1 illustrates a data discrepancy that would be difficult and time-consuming to find using
traditional data checking, but should be uncovered by a network model.
Figure 1: Example of System Piping Connections Best Checked Using Hydraulic Modeling
Much could be said about approaches to conducting field and record checking, including how to
determine the appropriate statistical sampling groups. Further exploration of field and record
checking is outside the scope of this paper. Therefore the following sections concentrate on the
opportunities to incorporate indirect checking that results in enhanced GIS data quality and
reduced expenditures.
Indirect Data Quality Evaluation
Indirect checking applies in cases where the question: "If my model reasonably matches the
system instrumentation, will my confidence in the GIS data be enhanced?" can be answered
"Yes". There are two important criteria that must be considered in addressing this question.
They are:
- Is there sufficient system instrumentation and is it accurate enough to support an indirect data check?
- Is the software capable of building the model from the GIS data and does the application
provide results that are sufficiently accurate for this type of comparison?
The first question has several very important considerations. The existing system
instrumentation may not be adequate in terms of having a sufficient number of instruments in the
required locations. System instrumentation is generally prescribed by the operations department
and is targeted to meet operating needs. Typically, there is monitoring of key external or internal
supplies and regulation points, recording of conditions for important customers or those with
contracts that specify the service quality (voltage, pressure, etc.), and at a few locations that have
had past pressure or voltage concerns.
Compared to the considerable savings and benefits that result from indirect data validation, the
cost to incorporate sufficient additional instrumentation is minor. Beyond the GIS validation
benefits, the resulting data will also support more accurate calibration of the network model. A
more accurate model can be leveraged into better work prioritization, reduced capital
improvement project costs, and improved service in terms of service quality and outage service
restoration. Some utilities have reduced the net cost of expanding their instrumentation coverage
by optimizing the locations of existing equipment and maintaining an inventory of mobile
instruments that are rotated through the system to support calibration data collection.
The leading edge network modeling software has evolved to the point where predictive models
of any size network can be created, with the resultant model accuracy ostensibly limited only by
the validity of the underlying facility and consumption data. Likewise, moderate priced desktop
computers will support detailed integrated facility and customer usage models for large systems,
including modeling the implications of multiple supplies, parallel regulation, and temporary
backfeeds to customers. These desktop applications can now utilize the highly granular data in
GIS systems, with model simplification/skeletonization being an option, not a requirement.
Computing and instrumentation technology are at a point where accurate network analysis
software and instrumentation to support model calibration can be implemented at any utility with
moderate cost. For a GIS project, the savings resulting from greater use of indirect data
validation and higher quality data sets will overshadow the cost of implementing a network
analysis system sophisticated enough to support the GIS project quality assurance program.
Advantages and Benefits of Indirect Checking
Incorporating applications that use the GIS data into the quality assurance system provides a
number of direct and indirect benefits to the utility, including:
- Reduced costs of topology and attribute data validation
- More efficient prioritization and use of field check staff, by focusing the initial field checking on areas identified by the application's results
- The ability to further prioritize field-checking, for example, by concentrating on areas the network hydraulic analysis identifies as having marginal capacity. Typically, some of these problems are caused by data errors, while others may result in the discovery of areas in need of reinforcement.
- Ability to evaluate system conditions to the level of detail included in the GIS. For example, the GIS-based model will identify localized system deficiencies that would not appear in a simplified (skeletonized) model or summarized data set.
- Using key applications early in the project proofs the adequacy of the GIS data schema and interface to support internal and external applications vital to the business.
- Applications that use the GIS data provide a check of the project data capture and data conversion processes. On the contrary, field checking and record checking refine the source data but provide no GIS database population processes checking.
- If an outcome of the GIS project is the replacement of a software application that has high user confidence with a new application that is more GIS-centric, the end users will have more time to become comfortable with the new application and will be able to check that the GISbased system can produce the same results for equivalent system conditions
- Use of key applications can serve as tools in the project quality assurance system, reducing the need to develop project specific quality control applications.
- Using key applications early in the data conversion process minimizes the project risk by avoiding unpleasant compatibility surprises and expensive re-work in the late stages of the project.
- Forces the GIS development to have a strong user focus.
Possible issues related to application use on GIS Projects
While there are numerous benefits to incorporating applications-based testing into a GIS
implementation project, there are also some problems that could arise to the detriment of the
GIS project. Proper planning, being alert to signs of problems and being proactive in addressing
issues will be major factors in the success of the GIS project. Some of the issues are:
- Use of the applications for quality checking will require the commitment of experienced application users.
- The use of applications for testing may be viewed as superfluous or excessive, and impacting the project schedule
- There may be new applications internal or external to the GIS. Some of these new applications may not be available in time for testing or they may be only partially fieldtested. It may also be difficult to find someone knowledgeable enough to use the new applications as QA tools.
- Management of the GIS project may be more difficult because of the larger number of staff
and departments that will be involved. Key personnel or management outside the core GIS
project may not give sufficient priority to their GIS support role
All of the above items are valid concerns. However, in most cases these issues can be minimized
through proper project set-up, task sequencing, communications, and good project management.
The following provides some thoughts on managing these issues:
- There is a trade-off in having experienced application users involved in the GIS testing
process because of conflicting loyalties and priorities. However, without user involvement,
there is considerable risk that the GIS data or interface may not support an application critical
to the company. If the end users are seen and treated as key members of the GIS team and
they understand why participation is in their best interests, their support should not pose a
problem.
- Often the most rigorous test of data is evaluating its ability to support a complex application.
Conversely if, for whatever reason, the GIS data cannot support the targeted work processes,
the project and its QA processes will be seen as failures. For critical systems, the
consequences of having a discrepancy between the GIS and the application are serious. The
best way to minimize that risk is to validate the GIS/application system as soon as possible.
There should be a limited and manageable number of applications flagged as "key
applications".
- Implementing new GIS applications and migrating to different external applications as part
of the GIS implementation are among the most challenging aspects of the GIS integration.
Compromises will have to be made as to what state of development the target applications
should be in before the GIS project progresses to the full implementation stage. As much as
possible, new applications that support critical requirements should be accounted for in the
testing and integration planning. Applications that cannot be validated in the pre-production
phases may need to have more extensive design reviews and change controls.
- The size of the utility's core GIS implementation team should be as lean as possible, while
ensuring that the significant end-users are represented and involved through all the stages of
the project. Presentation of the overall goals and requirements of the GIS project, the critical
objectives, and the proposed GIS team composition to line managers and supervisors prior to
commencing any significant design can provide a check that adequate representation and
staff involvement occurs.
Suggested Approach
The following provides a synopsis of key activities that should be factored into the GIS
implementation plan in order to coordinate application based testing:
Planning:
- Ensure that a comprehensive list of significant applications, business processes, and systems
that will be affected by the GIS implementation has been compiled. The document should
also include a list of key corporate systems and functions that will not be involved in the
initial implementation, noting which functions are being considered for future integration.
- Have the final list of affected/un-affected systems reviewed and approved by management.
- Rate the applications using factors such as importance in supporting key business functions,
maturity of the application, and the complexity of the data exchange between the GIS and the
application.
- Select approximately ten critical applications that should be rigorously tested and used as a
quality assurance metric during the project.
- Finalize the draft project plan and schedule, accounting for the application testing and
involvement of key functional staff.
- Check that the core project team, supporting staff, application development and testing
requirements, and project objectives are consistent
- Brief functional and executive management on the plan and schedule, emphasizing the roles
that the functional departments and staff will have on the team.
- Obtain buy-in from key managers and sponsors. Identify "champions" for each major area of
functionality.
- Determine what training, documentation, and other services will be required to support
application testing and integration.
Implementation:
- Ensure that the final, approved project plan and schedule are provided to all participants and
they are committed to their roles.
- Check that the project reporting system is in place and that it keeps team members abreast of
progress, upcoming key milestones, events, and meetings. Special attention should be given
to communicating schedule changes that may affect the support team. Institute checks to
ensure the reporting system remains active and meets the needs of the functional managers
and support team.
- Finalize and execute the training plan
Design:
- Plan to have end users directly involved in the major design meetings that define
functionality, data, and the user interface.
- Ensure that the goals, timing, and extent of application testing defined in the project plan are
fully understood by developers and testers. Verify that the project schedule includes time to
resolve and correct deficiencies identified during application testing.
- Track documentation development to check that it will support the training and testing
schedule.
Pilot Data Conversion:
- Maintain close contact with application experts and application developers. Use the
reporting system to communicate the pilot data conversion and application development
status so testers are aware of the latest data delivery and testing schedule.
- Distribute application documentation and conduct training.
- Carry out and document the results of the application testing.
- Resolve technical and scope issues from the testing and implement required changes.
- Retest the applications as necessary to account for final software development and data
conversion process changes.
- Evaluate the status of the application testing as a factor in deciding whether it is appropriate
to enter the production stage.
Production:
- Repeat the application testing once the production process is firmly established.
Applications that showed software and data maturity during the pilot phase can have limited
testing.
- Keep application users informed of the production status and the target plan for transition to
the GIS-based application.
Acceptance:
- Involve the end users in the acceptance process and ensure that data or application
deficiencies are documented and prioritized and that the vendor(s) have provided schedules
for problem resolution.
Conclusion
A typical GIS project includes implementation of the GIS native software, computing and
network infrastructure setup, custom GIS application development, integration of other corporate
data or analysis software, and data conversion. Most of these endeavors must proceed in parallel
in order to achieve a realistic implementation schedule. Often, complex applications are not
fully developed when data conversion prototyping is completed and production starts.
Compatibility between the GIS and the critical GIS-based applications must be a high priority
objective, and the GIS cannot be considered in service until the users of the GIS-applications are
confident that they can abandon their legacy systems. Prudent management dictates that special
attention and extensive quality assurance should apply to the operation of critical applications.
Experience indicates that there are many advantages to integrating application testing, such as
work management, infrastructure valuation, outage recovery, and network analysis applications,
into the process. As soon as possible, but generally no later than during the pilot data
conversion, critical applications should be thoroughly exercised.
Testing critical applications at the pilot stage or earlier will uncover most of the deficiencies in
the data schema, the application design and development, and provide a quality assurance check
of all the processes that impact the end product delivery.
A secondary advantage of early testing of critical applications is to provide hands-on
familiarization for end-users, ample opportunity to uncover and reconcile any differences in
analysis results or application functionality, and obtain stronger buy-in and support from the
functional departments who will be the beneficiaries of the new GIS-based systems.
|