GISdevelopment.net ---> GITA 2000 ---> Data Development and Evolution

Application integration in data conversion quality Measurement

Robert G. Stengle, P. E.
Stoner Associates, Inc., 1170 Harrisburg Pike
Carlisle, PA 17013


Overview
Implementation of a Geographic Information System involves a complex mixture of computer hardware specification and procurement, software selection, software development, and data conversion. As with any complex system implementation, quality assurance plans are essential in providing a means to gauge the progress of the project in meeting its objectives. An effective quality assurance program provides timely identification of design or process deficiencies. Creation of new data and conversion and loading of existing enterprise data into the GIS is an extensive and costly segment of the GIS setup. The source data used to populate the GIS will have varying degrees of currency, completeness, attribute accuracy, and spatial accuracy. In some cases, the "pedigree" of the source data will be known, while others may be difficult to assess. It is not unusual to have different accuracies within one source data set. As an example, facilities mapped before a certain date or in a certain area may be more suspect than other data. User confidence in the GIS data is a key aspect of responding to customer, contractor, and other inquiries or to conduct network analysis. For example: in responding to an excavation plan, the GIS user needs to understand the completeness of the data as well as the accuracy of its spatial presentation. Depending on the data characteristics, the contractor's request may or may not warrant a field check by the utility.

This presentation will explore ways to establish the all-important "data about data" (metadata) characteristics of a GIS.

Approaches to Quantifying Data Accuracy
There are a number of factors that may affect the accuracy of the source data for the GIS, including:
  • Source maps which have inadequate topographical references
  • Inaccuracies in the landbase of the source data
  • Inaccuracies in the landbase used in the GIS
  • Inaccuracies in the field sketches for as-built facilities
  • Distortions while incorporating the field sketches onto the master drawing
  • Facility changes that are not incorporated in the master drawing
  • Attributes which are missing from the source maps, such as material type
  • Errors made in recording or transferring attribute data
  • Shrink, stretch, or other distortion of the source map
  • Source maps that are of a smaller scale than the normal scale used by the GIS operators and map users
  • Data blurring during reproduction, due to a defective master map or a process problem that creates an erroneous pipe break or improper connection in the copy
  • Unavailability or improper interpretation of detail drawings, causing a congested area to be misinterpreted during data capture
  • Lack of data on the interior conditions of the equipment (e.g. valve trim, pipe interior diameter and/or wall coating thickness)
  • Data sets which come from existing systems wherein some of the information is not validated and/or maintained at all or to the accuracy level needed by the target GIS application(s)
Most, if not all, utilities must deal with one or more of the above data issues. The research and/or field checking effort to fully validate the accuracy of a GIS data set would be prohibitive for the vast majority of utilities and other distribution companies. Validation of transmission facilities is more practical, but would still involve considerable time and expense.

Facilities verification can be carried out through field checks, record checks and/or indirect checks. For field checking of buried facilities, magnetic sensors or other special equipment may be used to locate the route of the facilities. Some excavation may also be required. Record checks involve reviewing more accurate records such as individual as-built drawings or service cards. Indirect checking involves measuring parameters in the system and comparing them with those reported by a model built from the GIS data. Typically, the parameters involved in indirect checking are pressures and flows in fluid systems and power, current, and voltage in electrical systems.

Field and record checks generally involve statistical sampling and extrapolation of the sample results to geographical or equipment groups. For some facilities attributes, such as the spatial accuracy of buried equipment, this is the only practical method of data verification. While direct checks are costly, there is no acceptable substitute for validating certain types of data. However, indirect checking, either alone or coordinated with field and record checks, can provide the enterprise with improved data quality at a lower cost than would be the case if only direct checks were conducted.

We noted above that spatial attributes are best validated through field and record checks. An example where indirect checking would apply is valve status. A combination of field and indirect checking may be the most effective approach for checking valves. Consider a situation where a statistical field survey of a gas system results in identifying one percent of the noncritical ("convenience") valves as being found either partly or fully shut. If there are 20,000 valves in the system, it would be a major undertaking to locate and manually check every valve. A viable alternative would be to create a hydraulic model of the system and then investigate areas where there are unexplained differences between the model results and the field pressures. The model results could then be used to direct the field investigation in specific sections of the system. An out-of-position valve could cause a disparity between the model and the field, so valve position checks in the vicinity of pressure/flow discrepancies would be included in the field investigation.

For some required data, indirect checking is the only feasible way to estimate or validate facilities attributes. The best example of this is determining the interior conditions of water distribution piping. By running a model of the water system developed from the GIS data, the field versus model comparison can be used to assign pipe interior condition parameters (Hazen- Williams C factor, Darcy-Weisbach friction factor, effective inside diameter, and/or pipe efficiency) that result in the best match between the recorded pressures and flows and the model results. The results of the field and hydraulic model comparison may dictate the need for some field testing to measure individual pipe capacities for critical backbone lines. Figure 1 illustrates a data discrepancy that would be difficult and time-consuming to find using traditional data checking, but should be uncovered by a network model.


Figure 1: Example of System Piping Connections Best Checked Using Hydraulic Modeling

Much could be said about approaches to conducting field and record checking, including how to determine the appropriate statistical sampling groups. Further exploration of field and record checking is outside the scope of this paper. Therefore the following sections concentrate on the opportunities to incorporate indirect checking that results in enhanced GIS data quality and reduced expenditures.

Indirect Data Quality Evaluation
Indirect checking applies in cases where the question: "If my model reasonably matches the system instrumentation, will my confidence in the GIS data be enhanced?" can be answered "Yes". There are two important criteria that must be considered in addressing this question. They are:
  • Is there sufficient system instrumentation and is it accurate enough to support an indirect data check?

  • Is the software capable of building the model from the GIS data and does the application
  • provide results that are sufficiently accurate for this type of comparison?
The first question has several very important considerations. The existing system instrumentation may not be adequate in terms of having a sufficient number of instruments in the required locations. System instrumentation is generally prescribed by the operations department and is targeted to meet operating needs. Typically, there is monitoring of key external or internal supplies and regulation points, recording of conditions for important customers or those with contracts that specify the service quality (voltage, pressure, etc.), and at a few locations that have had past pressure or voltage concerns.

Compared to the considerable savings and benefits that result from indirect data validation, the cost to incorporate sufficient additional instrumentation is minor. Beyond the GIS validation benefits, the resulting data will also support more accurate calibration of the network model. A more accurate model can be leveraged into better work prioritization, reduced capital improvement project costs, and improved service in terms of service quality and outage service restoration. Some utilities have reduced the net cost of expanding their instrumentation coverage by optimizing the locations of existing equipment and maintaining an inventory of mobile instruments that are rotated through the system to support calibration data collection.

The leading edge network modeling software has evolved to the point where predictive models of any size network can be created, with the resultant model accuracy ostensibly limited only by the validity of the underlying facility and consumption data. Likewise, moderate priced desktop computers will support detailed integrated facility and customer usage models for large systems, including modeling the implications of multiple supplies, parallel regulation, and temporary backfeeds to customers. These desktop applications can now utilize the highly granular data in GIS systems, with model simplification/skeletonization being an option, not a requirement.

Computing and instrumentation technology are at a point where accurate network analysis software and instrumentation to support model calibration can be implemented at any utility with moderate cost. For a GIS project, the savings resulting from greater use of indirect data validation and higher quality data sets will overshadow the cost of implementing a network analysis system sophisticated enough to support the GIS project quality assurance program.

Advantages and Benefits of Indirect Checking
Incorporating applications that use the GIS data into the quality assurance system provides a number of direct and indirect benefits to the utility, including:
  • Reduced costs of topology and attribute data validation
  • More efficient prioritization and use of field check staff, by focusing the initial field checking on areas identified by the application's results
  • The ability to further prioritize field-checking, for example, by concentrating on areas the network hydraulic analysis identifies as having marginal capacity. Typically, some of these problems are caused by data errors, while others may result in the discovery of areas in need of reinforcement.
  • Ability to evaluate system conditions to the level of detail included in the GIS. For example, the GIS-based model will identify localized system deficiencies that would not appear in a simplified (skeletonized) model or summarized data set.
  • Using key applications early in the project proofs the adequacy of the GIS data schema and interface to support internal and external applications vital to the business.
  • Applications that use the GIS data provide a check of the project data capture and data conversion processes. On the contrary, field checking and record checking refine the source data but provide no GIS database population processes checking.
  • If an outcome of the GIS project is the replacement of a software application that has high user confidence with a new application that is more GIS-centric, the end users will have more time to become comfortable with the new application and will be able to check that the GISbased system can produce the same results for equivalent system conditions
  • Use of key applications can serve as tools in the project quality assurance system, reducing the need to develop project specific quality control applications.
  • Using key applications early in the data conversion process minimizes the project risk by avoiding unpleasant compatibility surprises and expensive re-work in the late stages of the project.
  • Forces the GIS development to have a strong user focus.
Possible issues related to application use on GIS Projects
While there are numerous benefits to incorporating applications-based testing into a GIS implementation project, there are also some problems that could arise to the detriment of the GIS project. Proper planning, being alert to signs of problems and being proactive in addressing issues will be major factors in the success of the GIS project. Some of the issues are:
  • Use of the applications for quality checking will require the commitment of experienced application users.
  • The use of applications for testing may be viewed as superfluous or excessive, and impacting the project schedule
  • There may be new applications internal or external to the GIS. Some of these new applications may not be available in time for testing or they may be only partially fieldtested. It may also be difficult to find someone knowledgeable enough to use the new applications as QA tools.
  • Management of the GIS project may be more difficult because of the larger number of staff and departments that will be involved. Key personnel or management outside the core GIS project may not give sufficient priority to their GIS support role
All of the above items are valid concerns. However, in most cases these issues can be minimized through proper project set-up, task sequencing, communications, and good project management.

The following provides some thoughts on managing these issues:
  • There is a trade-off in having experienced application users involved in the GIS testing process because of conflicting loyalties and priorities. However, without user involvement, there is considerable risk that the GIS data or interface may not support an application critical to the company. If the end users are seen and treated as key members of the GIS team and they understand why participation is in their best interests, their support should not pose a problem.
  • Often the most rigorous test of data is evaluating its ability to support a complex application. Conversely if, for whatever reason, the GIS data cannot support the targeted work processes, the project and its QA processes will be seen as failures. For critical systems, the consequences of having a discrepancy between the GIS and the application are serious. The best way to minimize that risk is to validate the GIS/application system as soon as possible. There should be a limited and manageable number of applications flagged as "key applications".
  • Implementing new GIS applications and migrating to different external applications as part of the GIS implementation are among the most challenging aspects of the GIS integration. Compromises will have to be made as to what state of development the target applications should be in before the GIS project progresses to the full implementation stage. As much as possible, new applications that support critical requirements should be accounted for in the testing and integration planning. Applications that cannot be validated in the pre-production phases may need to have more extensive design reviews and change controls.
  • The size of the utility's core GIS implementation team should be as lean as possible, while ensuring that the significant end-users are represented and involved through all the stages of the project. Presentation of the overall goals and requirements of the GIS project, the critical objectives, and the proposed GIS team composition to line managers and supervisors prior to commencing any significant design can provide a check that adequate representation and staff involvement occurs.
Suggested Approach
The following provides a synopsis of key activities that should be factored into the GIS implementation plan in order to coordinate application based testing:

Planning:
  1. Ensure that a comprehensive list of significant applications, business processes, and systems that will be affected by the GIS implementation has been compiled. The document should also include a list of key corporate systems and functions that will not be involved in the initial implementation, noting which functions are being considered for future integration.
  2. Have the final list of affected/un-affected systems reviewed and approved by management.
  3. Rate the applications using factors such as importance in supporting key business functions, maturity of the application, and the complexity of the data exchange between the GIS and the application.
  4. Select approximately ten critical applications that should be rigorously tested and used as a quality assurance metric during the project.
  5. Finalize the draft project plan and schedule, accounting for the application testing and involvement of key functional staff.
  6. Check that the core project team, supporting staff, application development and testing requirements, and project objectives are consistent
  7. Brief functional and executive management on the plan and schedule, emphasizing the roles that the functional departments and staff will have on the team.
  8. Obtain buy-in from key managers and sponsors. Identify "champions" for each major area of functionality.
  9. Determine what training, documentation, and other services will be required to support application testing and integration.
Implementation:
  1. Ensure that the final, approved project plan and schedule are provided to all participants and they are committed to their roles.
  2. Check that the project reporting system is in place and that it keeps team members abreast of progress, upcoming key milestones, events, and meetings. Special attention should be given to communicating schedule changes that may affect the support team. Institute checks to ensure the reporting system remains active and meets the needs of the functional managers and support team.
  3. Finalize and execute the training plan
Design:
  1. Plan to have end users directly involved in the major design meetings that define functionality, data, and the user interface.
  2. Ensure that the goals, timing, and extent of application testing defined in the project plan are fully understood by developers and testers. Verify that the project schedule includes time to resolve and correct deficiencies identified during application testing.
  3. Track documentation development to check that it will support the training and testing schedule.
Pilot Data Conversion:
  1. Maintain close contact with application experts and application developers. Use the reporting system to communicate the pilot data conversion and application development status so testers are aware of the latest data delivery and testing schedule.
  2. Distribute application documentation and conduct training.
  3. Carry out and document the results of the application testing.
  4. Resolve technical and scope issues from the testing and implement required changes.
  5. Retest the applications as necessary to account for final software development and data conversion process changes.
  6. Evaluate the status of the application testing as a factor in deciding whether it is appropriate to enter the production stage.
Production:
  1. Repeat the application testing once the production process is firmly established. Applications that showed software and data maturity during the pilot phase can have limited testing.
  2. Keep application users informed of the production status and the target plan for transition to the GIS-based application.
Acceptance:
  1. Involve the end users in the acceptance process and ensure that data or application deficiencies are documented and prioritized and that the vendor(s) have provided schedules for problem resolution.
Conclusion
A typical GIS project includes implementation of the GIS native software, computing and network infrastructure setup, custom GIS application development, integration of other corporate data or analysis software, and data conversion. Most of these endeavors must proceed in parallel in order to achieve a realistic implementation schedule. Often, complex applications are not fully developed when data conversion prototyping is completed and production starts. Compatibility between the GIS and the critical GIS-based applications must be a high priority objective, and the GIS cannot be considered in service until the users of the GIS-applications are confident that they can abandon their legacy systems. Prudent management dictates that special attention and extensive quality assurance should apply to the operation of critical applications. Experience indicates that there are many advantages to integrating application testing, such as work management, infrastructure valuation, outage recovery, and network analysis applications, into the process. As soon as possible, but generally no later than during the pilot data conversion, critical applications should be thoroughly exercised.

Testing critical applications at the pilot stage or earlier will uncover most of the deficiencies in the data schema, the application design and development, and provide a quality assurance check of all the processes that impact the end product delivery.

A secondary advantage of early testing of critical applications is to provide hands-on familiarization for end-users, ample opportunity to uncover and reconcile any differences in analysis results or application functionality, and obtain stronger buy-in and support from the functional departments who will be the beneficiaries of the new GIS-based systems.
© GISdevelopment.net. All rights reserved.