GISdevelopment.net ---> GIS for Oil & Gas Proceedings 2002

A Pilot Project for Land base Migration

Dave Magee
Account Manager, Utilities

Jay Clark
Product Manager, Utilities

Bart Guetti
Software Engineer

Geographic Data Technology, Inc.
11 Lafayette St
Lebanon NH 03766
Telephone: 800 331-7881 ext. 1112
Fax: 603 653-0249
E-mail: jay_clark@gdt1.com
bart_guetti@gdt1.com


Abstratct
This presentation examines data modeling, topological maintenance, and alignment of multiple spatial data sets in a project for Atlanta Gas Light Company. Processes include:
  • conflation of public vector data with Geographic Data Technology’s addressed street centerline
  • horizontal realignment of the resulting street network using aerial imagery
  • auto-correlation of AGL’s street centerline to the improved landbase
  • repositioning of AGL’s boundaries, point data, facilities and other data layers to the improved geometry.
Introduction

This presentation discusses a pilot project undertaken by Geographic Data Technology, Inc. (GDT) for Atlanta Gas Light Company (AGL), the eighth largest natural gas distribution utility in the United States.

Faced with advances in software and GIS technology since the development of their last GIS, AGL decided that it was time both to replace their GIS software environment and to migrate to a more accurate and upto- date landbase.

When GDT met with AGL in January 2001, discussions of the landbase migration process centered on three key issues.

Data Quality
AGL wanted the new landbase to be spatially accurate enough to overlay an aerial image of equal or better quality than a United States Geodetic Survey digital orthorectified quarter quadrangle (USGS DOQQ). This would result in assumed horizontal accuracy of +/- 5 to 7 meters from “groundtruth”. AGL also stipulated that attribution available with the data needed to be current and complete enough for service request and Customer Information System (CIS) purposes.

Management of Facilities Data Conversion
AGL wanted to manage the alignment of their facility data to the new landbase in the most accurate and efficient manner possible

Cost
AGL sought to balance the appropriate level of effort to get the job done correctly with a moderate budget.

AGL contracted a pilot project for DeKalb County, Georgia, in order to demonstrate the feasibility of a solution from a commercial vendor such as GDT. DeKalb is one of the heavily urbanized counties that make up the metropolitan Atlanta region.



The new Landbase Data


Quality Criteria
The following criteria were developed for the new landbase GIS:
  • Horizontal errors would be less than 7 meters averaged over 90% of all points in the data set
  • Sufficient attributes would be included in the new landbase to provide GIS functionality for all departments
  • The GIS would employ a “real world” coordinate system
  • It would be compatible with an “off the shelf” relational database management system (RDBMS)
  • The system would accept transactional updates.
Data Specifications
Spatial accuracy within the tolerance stated would be obtained using digital vector data from the Georgia State Data Clearinghouse, or, where vectors were unavailable, created from USGS DOQQ.

The following data model was selected.

Landbase GIS Transportation Data Model




It was agreed that grouping of objects and attributes would support all departments in AGL


It was also decided that data delivery would be in Geographic projection, using NAD83 and decimal degrees. For purposes of the pilot project, the data would also be published in UTM zone 16, NAD83, in meters. This would allow the data to be overlaid onto USGS DOQQs for a spatial “sanity check”.

For purposes of the pilot project, ArcView® GIS format was used to display the data. For full rollout the data delivery would be in the form of a Geodatabase or ArcSDE™ files. Specific RDBMS was not determined at this time at the outset of the project

Conflation
(Alan Witmer, 2001)

GDT acquired the DeKalb County GIS coverage from GSDC and conflated the horizontal control network into its core database. The process of the vector conflation activity is described below. GDT’s approach was to modularize the software and the process for each step listed below, allowing for individual development, tuning, independent operation, and quality assurance.

Correlating features in two landbases (conflation) requires these steps:

Prepare the databases for conflation processing.
Analyze the incoming data’s quality and usability, and convert as necessary to a common format.

Build a Common Representation
GDT builds a topological representation from the selected features to be matched in each landbase in order to filter out unwanted detail and form two congruous data sets, to organize the remaining data into chains and their intersection/end nodes, and to generate units of geography that can be meaningfully compared. This process also provides software-generated attribution or information to guide the correlation process past ambiguities such as tight multiple-lane highway representations.

The topological model aggregates the remaining linear features to make meaningful entities or “chains”. For example, a chain of arcs representing a street centerline, running uninterrupted from one intersection to the next, might be considered an aggregate. The operator defines the aggregation rules for each conflation so that the model can avoid aggregating wherever a significant attribute – such as name or feature type – changes. This can help matching when both data sets reliably record a given attribute. For example, if both landbases record street name with a high degree of accuracy, then a name change along a street, even if it is not at an intersection, should be considered as a node between two distinct chains.

GDT also builds additional information at this time. In particular, the software locates and marks multiplemultiple roads per user-supplied criteria. It assigns a directional flag to indicate on which side the counterpart is found. This prevents ambiguity later, eliminating the possibility that the wrong lanes will be matched.

Matching

Identify common elements



Figure 1 above illustrates the basic challenge of matching. We see a view of two overlaid street centerline databases. At first glance, it seems that they represent the same area. We see a major road in each database, with a common route number and similar heading. There is a development to the northeast of that road in each case, with some similarity in names and geography. Even the crook in North St below the highway (label 1) bears enough similarity to that of the unnamed road to prompt a mental match, despite the difference in detail. But there are significant issues for software: roads are more angular in one database, lengths and proportions vary significantly, and streets that should match are not often nearest neighbors (label 1, North street, is a case in point). The following labeled areas illustrate other common challenges:
  • Corresponding streets meet in differing intersection configurations like the North St/Unnamed intersection with Route 16.
  • The names are similar, but not exact: “Alton Hgts Ln” versus “Afton Ln”.
  • Two stretches of road in one database (North St.) match to only one in the other, and the single item must be conceptually split in order to build a one-to-one relationship.
  • The B St/ Unnamed match continues further in one database than the other, and conflation must decide how much of the more-complete street should be matched.
The process begins with node matching. Nodes are the confluence of a great deal of information, and are thus the places where pivotal matches can be assured. As with most other conflation software developers, GDT uses iterative matching, choosing the strongest node matches in an early pass, and then conceptually rubber sheeting and using neighborhood information to match in repeated passes, continuing as long as new matches can be found.

Node matching uses two match agents. One agent analyzes the candidate nodes’ rubber-sheeted offset and area density of nodes. A second attempts to build an optimal “test match” of all the feature chains that are incident at the node pair, to determine the similarity of the local features at the nodes. Following node match, the GDT process uses the matched nodes as a guide to matching our topologic chains. Chain match criteria include agents that weigh:
  • overall orientation of the line or significant shape
  • convexity/concavity
  • overall length
  • neighboring node topology and match status
  • affine transformation of both lines based on calculated trend and
  • the overall quality of all other characteristics if one or the other chain were split.
In addition, the following attribute-based match agents may be enabled if the associated attributes are available and reliable:
  • name (using a tunable fuzzy text match algorithm)
  • feature classification
  • multiple lane side
  • polygonal boundary coding (such as presence in an incorporated city boundary)
  • other attributes, such as permanent ID
Generate Control Vectors and QC Points
With match information, the process can generate a rubber sheet mapping for use in realigning associated facility data. It can also identify points where the mathematical model breaks because the topology has changed significantly or wherever there is a large amount of “shear” in the warp model.

The conflation process descried above was also used to correlate the existing AGL landbase to GDT’s spatially improved landbase data.

Using the software correlation process, a Control Vector data set was produced.

Control Vectors
Control vectors are a linking data set that correlate between the nodes of the existing AGL landbase and the new landbase. Seen as a text file, the data set looks like this.

-97479453, 32956206, -322, 799

This is a longitude/latitude pair in decimal degrees (decimal implied at 6 points of precision) and an offset value equal to the horizontal change in decimal degrees. The same file can be converted to a “link” data type for Arc by calculating the offset values.

(-97479453 +-322) = -97479775
(32956206 +799) = 32957005

WARPING
AGL selected the following items for warping:

Object Name Geometry
  Type
Valve Point
Regstat Point
Netnode Point
Fitting Point
Pipeseg Line


The point features were merged into one feature named ‘Fittings”. The line (arc) features were left as the pipeseg coverage.

The original text data file was also converted to an Arc Link file as discussed earlier in this paper.

The facilities data was warped using ARCEDIT™ software.

A watch file was created to demonstrate the procedure, it is shown below Note that in the watch file CAPITAL UNDERLINES INDICATE USER INPUT ITEMS

General Procedure for warping:

Arc: |> ae <|
Copyright (C) 1982-2001 Environmental Systems Research Institute, Inc.
All rights reserved.
ARCEDIT (COGO) 8.1 (Fri Mar 16 11:31:29 PST 2001)
Arcedit: |> ec arcpipes <|
The edit coverage is now ENTER LOCATION OF COVERAGE TO BE WARPED HERE
WARNING the Map extent is not defined
Defaulting the map extent to the BND of COVERAGE TO BE WARPED
Arcedit: |> de arcs links <|
Arcedit: |> ef links <|
Adding the extreme boundary points as hull points
Please wait...
8 element(s) for edit feature LINKS
Arcedit: |> nodesnap closest .00001 <|
Arcedit: |> weedtolerance .00001 <|
Arcedit: |> grain .00001 <|
Arcedit: |> get delinks <|
Copying the links from ADD LINK DATA COVERAGE LOCATION HERE
into COVERAGE TO BE WARPED
13396 link(s) copied
Arcedit: |> draw <|
Please wait...
Arcedit: |> select poly <|
Define the polygon HERE YOU WILL NEED TO DEFINE THE EXTENTS OF THE
COVERAGE THAT YOU WANT TO WARP BY DRAWING A POLYGON

<1,2 to enter, 4 to remove last point, 5 to remove polygon, 9 to end>
12196 element(s) now selected
Arcedit: |> nsel <|
1208 element(s) now selected
Arcedit: |> delete <|
1208 link(s) deleted
Arcedit: |> save <|
Saving changes for COVERAGE TO BE WARPED
Saving arcs...
** NOTE ** Arc(s) unchanged
Reopening arcs...
Please wait...
Saving links...
12196 link records(s) written to LINK COVERAGE
from the original 0 link, 12196 added and 0 deleted
Reopening links...
Please wait...
BND replaced into COVERAGE TO BE WARPED
Saving set tolerances to TOL file...
Re-establishing edit feature LINK
Arcedit: |> limitadjust poly <|
Define the polygon HERE YOU WILL AGAIN TRACE THE EXTENTS OF THE AREA TO BE WARPED
<1,2 to enter, 4 to remove last point, 5 to remove polygon, 9 to end>
Limiting polygon has an area: 8762.961 and a perimeter: 39758.033
Deleting all links falling outside of limiting area...
Adding the perimeter of the limiting area as identity links...Please
wait...
Arcedit: |> adjust bivariate <|
Adjusting coverage COVERAGE TO BE WARPED
Building the adjustment structure from the links for the first pass...
Proximal tolerance set to 0.000...
Removing duplicate points within tolerance...
Within tolerance 0. Remaining 6193...
Proximal tolerance set to 0.000...
adjusting ARCs...
Please wait...
adjusting LABELs...
Updating the adjustment structure for the second pass...
adjusting ARCs...Please wait...
Please wait...
adjusting LABELs...
Arcedit: |> save <|
Saving changes for COVERAGE TO BE WARPED
Saving arcs...
5740 arc attribute record(s) written to COVERAGE TO BE WARPED
5740 arc(s) written to COVERAGE TO BE WARPED
from the original 5740, 5740 added and 5740 deleted
Reopening arcs...
Please wait...
Saving labels...
** NOTE ** Label(s) unchanged
Reopening labels...
Please wait...
Saving links...
6193 link records(s) written to COVERAGE TO BE WARPED from the
original 12196 link, 6192 added and 12195 deleted
Reopening links...
Please wait...
BND replaced into COVERAGE TO BE WARPED
Saving set tolerances to TOL file...
Re-establishing edit feature LINK
Arcedit: |> &w &off <|


The results of the warping can be observed in the illustration below.



Overall results in the data set show that the warping process has made a good start at facilities realignment. There are several issues with each warping method that need to be addressed:
  • are the number of vertices in the facilities data close to the number of vertices in the measured landbases?
  • too many vertices can cause jagged lines.
  • is the warping software “tuned” properly?
  • parameters such as: NODESNAP, WEEDTOLERANCE, and GRAIN (ARCEDIT) need to be set correctly to achieve optimal results.
Optimization of this process is certainly possible given proper training and time. Like any other editing task, it is a combination of knowledge, craft, and a little art. Once the combination of data, software, and service was applied to the problem, an 80% solution was achieved.

Conclusion
With the results of the pilot project in hand, AGL studied the warped facilities data carefully. Both GDT and AGL staff members warped the data independently and obtained very similar results. As a result of this pilot project, AGL decided to move forward with a similar process for their new landbase migration project. Current schedule plans the completion of the entire migration effort by early 2003.

Acknowledgement
Witmer, A, 2001, “The Best of Both Worlds: Vector Conflation of Database Segments and Attributes”, presentation at ESRI User Conference 2001.
© GISdevelopment.net. All rights reserved.