Printer friendly format
Introduction
Network-based enterprises like utilities and telecom are being driven to improve the operational efficiency and service reliability thanks to the increased demand in the market and increased customer expectations. It is now an established fact that GIS is a major enabler of efficiency gains in these industries. Several of these industry majors have begun appreciating gains of populating their network data into the geospatial system. However, for such gains to be realized, the data that powers the GIS must be of the highest quality. The better the quality of data, the more is the reliability of such a representation. This eventually enables improved decision-making.
Organizations with substantial investments in geospatial information are finding that their return on investment is limited slightly due to lack of relevance with the other internal datasets, or general inaccuracies owing to emerging external needs and standards.
This paper presents the industry best practices and methodologies deployed to achieve the objective of quality geospatial data.
Impact of Poor Data Quality
The ISO TC211 standards list the quality elements as being completeness, logical consistency, positional accuracy, temporal accuracy, and thematic accuracy. Production of data by various mapping agencies assesses the data quality based on these parameters. Data created from different channels with different techniques can have discrepancies in terms of resolution, orientation, and displacements.
These data inaccuracies lead to poor performance of the applications, stressing the need for higher levels of spatial accuracy to fully realize operational benefits. Some of the impacts are listed below:
Inaccurate data fed to:
-
The field application leads to improper deployment of field crew, resulting in reduced efficiency between work orders
- Meter reading route application, adversely impacting equipments costs, fuel, and labor
- Trouble Call Ticket Management process, increasing the verifications and the number of trouble tickets
- Outage management system, impeding the emergency responses
- Greater amount of time spent on problem identification and resolution, leading to reduced customer satisfaction levels and harmed business interests
- GIS application leads to processing errors. These slow down payments, interrupt project schedules, and impact decision-making, etc.
An industry expert had observed that ‘If bad data impacts an operation even for five percent of the time, it still adds a staggering 45% to the cost of operations.’ Therefore, the need for data improvement is important. Although there could be several types of data inconsistencies, the primary focus should be given to the issues related to data precision as it causes greater impact than the rest.
The data precision aspect is more physical since it compares the reality vs. what is present in the dataset produced. For a landbase data, the gap can be effectively addressed by conflating features from different sources or data from different scales. For a utility or telecom network data, the problem is more internal in that the comparison is between the ‘as-on-the-ground’ vs. ‘as-is-stored in the database’. The spatial conflation is inherently complex with the need to determine and eliminate discrepancy between the source and the target dataset. A number of different techniques and tools have been developed in the industry to make this job simpler. Spatial conflation alone is not sufficient to produce a complete data. Attribute conflation is equally important and can be achieved with a good process layout and automated verification and validation methods.
Straining the data through the applications will help determine the logical consistency; but there is not any such method for establishing the positional accuracy. A fractional improvement in the positional accuracy can lead to substantial gains in execution costs and response times. That the positional accuracy can help meet compliance requirements is enough of a reason to start investing into this aspect, but the job of positioning or re-positioning the entire network data to a more accurate landbase will appear to be complex and it surely is. However, the market now offers tools to semi-automate the job. This method, combined with a well-defined PAI process, will make the job easier and more manageable.
PAI Process Methodology
Typically, there are three cases where the positional accuracy improvement is required.
Case 1 - Replace the existing landbase with the most recent version
Case 2 - Update and upgrade the existing landbase to incorporate the latest changes
Case 3 - Fix the placement errors in the already mapped network data
It is extremely time-consuming and resource-intensive to obtain, match, and integrate the data with the other datasets and the required standards. So, there is a need for software tools and proven processes to automatically match and integrate the data.
There are different methodologies that can be adopted based on the existing data conditions, degree of real world changes, geographic shift, and source availability. A typical PAI process methodology consists of the following steps:

Figure 1: PAI Process Diagram
Pre-PAI Process
Before commencing the project, verify and create a report of the characteristics of input data sources viz. old landbase, new landbase, network data, link files (if available) etc. The report should include feature counts, relationships, and the connectivity. This information can be used in the subsequent stages of the PAI process to ensure that the post-PAI data is consistent, complete, and correct with respect to the original source data (as shown in figure 2).
By this time, one should have identified an appropriate product or a set of tools to automate the PAI process. Since the type of problem, source formats, existing data condition, and geometric constraints vary from one project to another, the product configuration file or settings needs to be adjusted to achieve the desired results. At some point, fine-tuning of the existing tools may also be inevitable.
PAI Process
This is the core activity of the entire process wherein the positional improvement takes place. It is always advisable to create a copy of the master database before performing the PAI operations.
Most of the 3rd party PAI tools available in the market actually work on the basis of pre-defined link vector file. The link vector file is a set of control points that indicate the position of certain features before and after the PAI (figure 1). The accuracy and the percentage of automatic repositioning depend on the correctness of link vectors. Therefore, it is very important to perform a thorough review of the link vectors to ensure that they are equally distributed. In case they are not equally distributed, the additional link vectors may have to be created manually.

Figure 2: Pre-PAI Network Data along with Link Vectors
Thereafter, execute the necessary scripts and tools available in the chosen PAI application on the selected tiles or batches. As explained above, most of the PAI software works on the basis of the link vectors. In the event that the link vectors are not available, or can not be generated for a particular data, you may need to use semi-automatic tools to accomplish the repositioning activity.
Some data might not fit exactly after the data repositioning is done. Typically, utility data tends to lie within pavements, roads or ditches, or above the ground passing over fences and boundaries (rarely aligning with landbase features). The topological relationship between the utility network data due to its inherent quality is relatively more important than the land and property applications. Hence, more visual checks need be performed for utility data. At places where the automatic repositioning does not produce good results, manual shifting methods need to be applied to complete the repositioning.

Figure 3: Pre PAI Data and Post-PAI Data
A key aspect of data quality is network connectivity and the related geometric properties. Therefore, a separate check needs to be established to validate these characteristics of the data.
Post-PAI Process
It is quite possible that a number of features may be shifted to incorrect locations due to various factors including but not limited to:
Original accuracy of the data Complexity of the geography and real world change
Nuances of gas network data Therefore, a comprehensive review and editing of the shifted features is very much required to confirm that the network data has been realigned as per the business rules provided. It is recommended to use two graphical windows to display pre and post-processing data for easy comparison of the data.
At this stage, it is also critical to verify that all the features from the source data have been shifted and are available in the post-PAI data. This can be achieved by comparing the feature count reports of pre-PAI and post-PAI data (as shown in figure 3).
As a final step, one quality check on a randomly selected sample data is advised to make sure that the final data conforms to the agreed criteria and standards. This process allows us to find out the data inaccuracies, if any, before delivering it to the customer.
The final step involves preparation of the data for delivery to customer. In this step, the resulted co-ordinates from the above process need to be updated into the master database. This ensures that no changes are made to the source database other than a change in the geometry attribute.
Conclusion
Utilities vary in the level in which they use GIS. However, the vast majority of them are making efforts to advance their GIS. In order to be successful, an accurate and precisely placed network data should be established.