GISdevelopment.net ---> GIS for Oil & Gas Proceedings 2001

A pilot project for landbase migration

Jay Clark
Product Manager, Utilities
Geographic Data Technology, Inc.
11 Lafayette St
Lebanon NH 03766
Telephone: 800 331-7881 ext. 1112
Fax: 603 653-0249
Email: jay_clark@gdt1.com


Abstract
This presentation examines data modeling, topological maintenance, and alignment of multiple spatial data sets in a project for Atlanta Gas Light Company. Processes include:
  • conflation of public vector data with Geographic DataTechnology’s addressed street centerline
  • horizontal realignment of the resulting street network using aerial imagery
  • auto-correlation of AGL’s street centerline to the improved landbase
  • repositioning of AGL’s boundaries, point data, facilities and other data layers to the improved geometry.
Introduction
This presentation discusses a pilot project undertaken by Geographic Data Technology, Inc. (GDT) for Atlanta Gas Light Company (AGL), the eighth largest natural gas distribution utility in the United States.

Faced with advances in software and GIS technology since the development of their last GIS, AGL decided that it was time both to replace their GIS software environment and to migrate to a more accurate and up-to- date landbase.

When GDT met with AGL in January 2001, discussions of the landbase migration process centered on three key issues.

Data quality
AGL wanted the new landbase to be spatially accurate enough to overlay an aerial image of equal or better quality than a United States Geodetic Survey digital orthorectified quarter quadrangle (USGS DOQQ). This would result in assumed horizontal accuracy of +/- 5 to 7 meters from “groundtruth”. AGL also stipulated that attribution available with the data needed to be current and complete enough for outage management, service request, and Customer Information System (CIS) purposes.

Management of facilities data conversion
AGL wanted to manage the migration of their facilities data to the new landbase in the most accurate and efficient manner possible

Cost
AGL sought to balance the appropriate level of effort to get the job done correctly with a moderate budget.

It was decided to embark on a pilot project for DeKalb County, Georgia, in order to demonstrate the feasibility of a solution from a commercial vendor such as GDT. DeKalb is one of the heavily urbanized counties that make up the metropolitan Atlanta region.


The new landbase data

Quality Criteria
The following criteria were developed for the new landbase GIS:
  • horizontal errors would be less than 7 meters averaged over 90% of all points in the data set
  • sufficient attributes would be included to provide GIS functionality for all departments
  • the GIS would employ a“real world” coordinate system
  • it would be compatible with an “off the shelf” relational database management system (RDBMS)
  • the system would accept transactional updates.
Data specifications
Spatial accuracy within the tolerance stated would be obtained using digital vector data from the Georgia State Data Clearinghouse, or, where vectors were unavailable, created from USGS DOQQ.

The following data model was selected.

Landbase GIS Transportation Data Model




It was agreed that grouping of objects and attributes would support all departments in AGL

It was also decided that data delivery would be in Geographic projection, using NAD83 and decimal degrees. For purposes of the pilot project, the data would also be published in UTM zone 16, NAD83, in meters. This would allow the data to be overlayed onto USGS DOQQs for a spatial “sanity check”.

For purposes of the pilot project, ArcView ® GIS format was used to display the data. For full rollout the data delivery would be in the form of a Geodatabase or ArcSDE™ files. Specific RDBMS was not determined at this time at the outset of the project

Conflation
(Alan Witmer, 2001)
GDT acquired the DeKalb County GIS coverage from GSDC and conflated the horizontal control network into its core database. The process of the vector conflation activity is described below. GDT’s approach was to modularize the software and the process for each step listed below, allowing for individual development, tuning, independent operation, and quality assurance.

Correlating features in two landbases (conflation) requires these steps:

Prepare the databases for conflation processing.
Analyze the incoming data’s quality and usability, and convert as necessary to a common format.

Build a common representation
GDT builds a topological representation from the selected features to be matched in each landbase in order to filter out unwanted detail and form two congruous data sets, to organize the remaining data into chains and their intersection/end nodes, and to generate units of geography that can be meaningfully compared. This process also provides software-generated attribution or information to guide the correlation process past ambiguities such as tight multiple-lane highway representations.

The topological model aggregates the remaining linear features to make meaningful entities or “chains”. For example, a chain of arcs representing a street centerline, running uninterrupted from one intersection to the next, might be considered an aggregate. The operator defines the aggregation rules for each conflation so that the model can avoid aggregating wherever a significant attribute – such as name or feature type – changes. This can help matching when both data sets reliably record a given attribute. For example, if both landbases record street name with a high degree of accuracy, then a name change along a street, even if it is not at an intersection, should be considered as a node between two distinct chains.

GDT also builds additional information at this time. In particular, the software locates and marks multiple-multiple roads per user-supplied criteria. It assigns a directional flag to indicate on which side the counterpart is found. This prevents ambiguity later, eliminating the possibility that the wrong lanes will be matched.

Matching
Identify common elements



Figure 1 above illustrates the basic challenge of matching. We see a view of two overlaid street centerline databases. At first glance, it seems that they represent the same area. We see a major road in each database, with a common route number and similar heading. There is a development to the northeast of that road in each case, with some similarity in names and geography. Even the crook in North St below the highway (label 1) bears enough similarity to that of the unnamed road to prompt a mental match, despite the difference in detail. But there are significant issues for software: roads are more angular in one database, lengths and proportions vary significantly, and streets that should match are not often nearest neighbors (label 1, North street, is a case in point). The following labeled areas illustrate other common challenges:
  • Corresponding streets meet in differing intersection configurations like the North St/Unnamed intersection with Route 16.
  • The names are similar, but not exact: “Alton Hgts Ln” versus “Afton Ln”.
  • Two stretches of road in one database (North St.) match to only one in the other, and the single item must be conceptually split in order to build a one-to-one relationship.
  • The B St/ Unnamed match continues further in one database than the other, and conflation must decide how much of the more-complete street should be matched.
The process begins with node matching. Nodes are the confluence of a great deal of information, and are thus the places where pivotal matches can be assured. As with most other conflation software developers, GDT uses iterative matching, choosing the strongest node matches in an early pass, and then conceptually rubber sheeting and using neighborhood information to match in repeated passes, continuing as long as new matches can be found.

Node matching uses two match agents. One agent analyzes the candidate nodes’ rubber-sheeted offset and area density of nodes. A second attempts to build an optimal “test match” of all the feature chains that are incident at the node pair, to determine the similarity of the local features at the nodes.

Following node match, the GDT process uses the matched nodes as a guide to matching our topologic chains. Chain match criteria include agents that weigh:
  • overall orientation of the line or significant shape
  • convexity/concavity
  • overall length
  • neighboring node topology and match status
  • affine transformation of both lines based on calculated trend and
  • the overall quality of all other characteristics if one or the other chain were split.
In addition, the following attribute-based match agents may be enabled if the associated attributes are available and reliable:
  • name (using a tunable fuzzy text match algorithm)
  • feature classification
  • multiple lane side
  • polygonal boundary coding (such as presence in an incorporated city boundary)
  • other attributes, such as permanent ID
Generate control vectors and QC points
With match information, the process can generate a rubber sheet mapping for use in realigning associated facility data. It can also identify points where the mathematical model breaks because the topology has changed significantly or wherever there is a large amount of “shear” in the warp model.

The conflation process descried above was also used to correlate the existing AGL landbase to GDT’s spatially improved landbase data.

Using the software correlation process, a Control Vector data set was produced.

Control vectors
Control vectors are a linking data set that correlate between the nodes of the existing AGL landbase and the new landbase. Seen as a text file, the data set looks like this.

-97479453, 32956206, -322, 799

This is a longitude/latitude pair in decimal degrees (decimal implied at 6 points of precision) and an offset value equal to the hortizontal change in decimal degrees. The same file can be converted to a “link” data type for Arc by calculating the offset values.

-97479775, 32957005

The control vector data is also normally translated into FFS format,the in-memory model for the Feature Manipulation Engine (FME) software made by Safe Software, Inc. This provides access to warping software inside the FME.

Warping
AGL selected the following items for warping:



An FME mapping file was generated, as shown below:
    # Load the FFS Vector Control file
    FACTORY_DEF * RecorderFactory \
    FACTORY_NAME "Read FFS Control Vectors" \
    FEATURE_FILE "$(SourceFFSfile)" \
    MODE PLAYBACK \
    OUTPUT RECORDED FEATURE_TYPE *

    # do the warping here

    FACTORY_DEF * WarpFactory \
    FACTORY_NAME "_factory_name_" \
    INPUT CONTROL_VECTOR FEATURE_TYPE ControlPoint \
    @Count(Point_Vectors) \
    INPUT CONTROL_VECTOR FEATURE_TYPE ControlVector \
    @Count(Line_Vectors) \
    INPUT OBSERVED FEATURE_TYPE * multi_reader_id 0 \
    WARP_METHOD RUBBER_SHEET \
    MAX_DISTANCE $(MaxDistance) \
    MAX_POINTS $(MaxPoints) \
    EXPONENT_WEIGHT $(WarpExponent) \
    OUTPUT CORRECTED FEATURE_TYPE * \
    @Count(WarpedFeatures)
The mapping file above was used for the pipeseg warping in the FME and was modified to work with the other selected facilities layers.

The original text data file was also converted to an Arc Link file as discussed earlier in this paper.

The facilities data was warped using the FME and ARCEDIT ™ software.

The results of the warping can be observed in the illustration below.




Overall results in the data set show that the warping process has made a good start at facilities realignment. There are several issues with each warping method that need to be addressed:
  • are the number of vertices in the facilities data close to the number of vertices in the measured landbases?
  • too many vertices can cause jagged lines.
  • is the warping software “tuned” properly? •parameters such as: MAX_DISTANCE, MAX_POINTS, and EXPONENT_WEIGHT (FME) NODESNAP, WEEDTOLERANCE, and GRAIN (ARCEDIT) need to be set correctly to achieve optimal results.
Optimization of this process is certainly possible given proper training and time. Like any other editing task, it is a combination of knowledge, craft, and a little art. Once the combination of data, software, and service was applied to the problem, an 80% solution was achieved.

Conclusion
With the results of the pilot project in hand, AGL studied the warped facilities data carefully. Both GDT and AGL staff members warped the data independentlyand obtained very similar results.

As a result of this pilot project, AGL decided to move forward with a similar process for their new landbase migration project. Current schedule plans the completion of the entire migration effort by early 2003.

Acknowledgement

Witmer, A, 2001, “The Best of Both Worlds: Vector Conflation of Database Segments and Attributes”, presentation at ESRI User Conference 2001.
© GISdevelopment.net. All rights reserved.