A pilot project for landbase migration
Jay Clark Product Manager, Utilities Geographic Data Technology, Inc. 11 Lafayette St Lebanon NH 03766 Telephone: 800 331-7881 ext. 1112 Fax: 603 653-0249 Email: jay_clark@gdt1.com Abstract This presentation examines data modeling, topological maintenance, and alignment of multiple spatial data sets in a project for Atlanta Gas Light Company. Processes include:
This presentation discusses a pilot project undertaken by Geographic Data Technology, Inc. (GDT) for Atlanta Gas Light Company (AGL), the eighth largest natural gas distribution utility in the United States. Faced with advances in software and GIS technology since the development of their last GIS, AGL decided that it was time both to replace their GIS software environment and to migrate to a more accurate and up-to- date landbase. When GDT met with AGL in January 2001, discussions of the landbase migration process centered on three key issues. Data quality AGL wanted the new landbase to be spatially accurate enough to overlay an aerial image of equal or better quality than a United States Geodetic Survey digital orthorectified quarter quadrangle (USGS DOQQ). This would result in assumed horizontal accuracy of +/- 5 to 7 meters from “groundtruth”. AGL also stipulated that attribution available with the data needed to be current and complete enough for outage management, service request, and Customer Information System (CIS) purposes. Management of facilities data conversion AGL wanted to manage the migration of their facilities data to the new landbase in the most accurate and efficient manner possible Cost AGL sought to balance the appropriate level of effort to get the job done correctly with a moderate budget. It was decided to embark on a pilot project for DeKalb County, Georgia, in order to demonstrate the feasibility of a solution from a commercial vendor such as GDT. DeKalb is one of the heavily urbanized counties that make up the metropolitan Atlanta region. ![]() The new landbase data Quality Criteria The following criteria were developed for the new landbase GIS:
Spatial accuracy within the tolerance stated would be obtained using digital vector data from the Georgia State Data Clearinghouse, or, where vectors were unavailable, created from USGS DOQQ. The following data model was selected. Landbase GIS Transportation Data Model ![]() ![]() It was agreed that grouping of objects and attributes would support all departments in AGL It was also decided that data delivery would be in Geographic projection, using NAD83 and decimal degrees. For purposes of the pilot project, the data would also be published in UTM zone 16, NAD83, in meters. This would allow the data to be overlayed onto USGS DOQQs for a spatial “sanity check”. For purposes of the pilot project, ArcView ® GIS format was used to display the data. For full rollout the data delivery would be in the form of a Geodatabase or ArcSDE™ files. Specific RDBMS was not determined at this time at the outset of the project Conflation (Alan Witmer, 2001) GDT acquired the DeKalb County GIS coverage from GSDC and conflated the horizontal control network into its core database. The process of the vector conflation activity is described below. GDT’s approach was to modularize the software and the process for each step listed below, allowing for individual development, tuning, independent operation, and quality assurance. Correlating features in two landbases (conflation) requires these steps: Prepare the databases for conflation processing. Analyze the incoming data’s quality and usability, and convert as necessary to a common format. Build a common representation GDT builds a topological representation from the selected features to be matched in each landbase in order to filter out unwanted detail and form two congruous data sets, to organize the remaining data into chains and their intersection/end nodes, and to generate units of geography that can be meaningfully compared. This process also provides software-generated attribution or information to guide the correlation process past ambiguities such as tight multiple-lane highway representations. The topological model aggregates the remaining linear features to make meaningful entities or “chains”. For example, a chain of arcs representing a street centerline, running uninterrupted from one intersection to the next, might be considered an aggregate. The operator defines the aggregation rules for each conflation so that the model can avoid aggregating wherever a significant attribute – such as name or feature type – changes. This can help matching when both data sets reliably record a given attribute. For example, if both landbases record street name with a high degree of accuracy, then a name change along a street, even if it is not at an intersection, should be considered as a node between two distinct chains. GDT also builds additional information at this time. In particular, the software locates and marks multiple-multiple roads per user-supplied criteria. It assigns a directional flag to indicate on which side the counterpart is found. This prevents ambiguity later, eliminating the possibility that the wrong lanes will be matched. Matching Identify common elements ![]() Figure 1 above illustrates the basic challenge of matching. We see a view of two overlaid street centerline databases. At first glance, it seems that they represent the same area. We see a major road in each database, with a common route number and similar heading. There is a development to the northeast of that road in each case, with some similarity in names and geography. Even the crook in North St below the highway (label 1) bears enough similarity to that of the unnamed road to prompt a mental match, despite the difference in detail. But there are significant issues for software: roads are more angular in one database, lengths and proportions vary significantly, and streets that should match are not often nearest neighbors (label 1, North street, is a case in point). The following labeled areas illustrate other common challenges:
Node matching uses two match agents. One agent analyzes the candidate nodes’ rubber-sheeted offset and area density of nodes. A second attempts to build an optimal “test match” of all the feature chains that are incident at the node pair, to determine the similarity of the local features at the nodes. Following node match, the GDT process uses the matched nodes as a guide to matching our topologic chains. Chain match criteria include agents that weigh:
With match information, the process can generate a rubber sheet mapping for use in realigning associated facility data. It can also identify points where the mathematical model breaks because the topology has changed significantly or wherever there is a large amount of “shear” in the warp model. The conflation process descried above was also used to correlate the existing AGL landbase to GDT’s spatially improved landbase data. Using the software correlation process, a Control Vector data set was produced. Control vectors Control vectors are a linking data set that correlate between the nodes of the existing AGL landbase and the new landbase. Seen as a text file, the data set looks like this. -97479453, 32956206, -322, 799 This is a longitude/latitude pair in decimal degrees (decimal implied at 6 points of precision) and an offset value equal to the hortizontal change in decimal degrees. The same file can be converted to a “link” data type for Arc by calculating the offset values. -97479775, 32957005 The control vector data is also normally translated into FFS format,the in-memory model for the Feature Manipulation Engine (FME) software made by Safe Software, Inc. This provides access to warping software inside the FME. Warping AGL selected the following items for warping: ![]() An FME mapping file was generated, as shown below:
FACTORY_DEF * RecorderFactory \ FACTORY_NAME "Read FFS Control Vectors" \ FEATURE_FILE "$(SourceFFSfile)" \ MODE PLAYBACK \ OUTPUT RECORDED FEATURE_TYPE * # do the warping here FACTORY_DEF * WarpFactory \ FACTORY_NAME "_factory_name_" \ INPUT CONTROL_VECTOR FEATURE_TYPE ControlPoint \ @Count(Point_Vectors) \ INPUT CONTROL_VECTOR FEATURE_TYPE ControlVector \ @Count(Line_Vectors) \ INPUT OBSERVED FEATURE_TYPE * multi_reader_id 0 \ WARP_METHOD RUBBER_SHEET \ MAX_DISTANCE $(MaxDistance) \ MAX_POINTS $(MaxPoints) \ EXPONENT_WEIGHT $(WarpExponent) \ OUTPUT CORRECTED FEATURE_TYPE * \ @Count(WarpedFeatures) The original text data file was also converted to an Arc Link file as discussed earlier in this paper. The facilities data was warped using the FME and ARCEDIT ™ software. The results of the warping can be observed in the illustration below. ![]() Overall results in the data set show that the warping process has made a good start at facilities realignment. There are several issues with each warping method that need to be addressed:
Conclusion With the results of the pilot project in hand, AGL studied the warped facilities data carefully. Both GDT and AGL staff members warped the data independentlyand obtained very similar results. As a result of this pilot project, AGL decided to move forward with a similar process for their new landbase migration project. Current schedule plans the completion of the entire migration effort by early 2003. Acknowledgement Witmer, A, 2001, “The Best of Both Worlds: Vector Conflation of Database Segments and Attributes”, presentation at ESRI User Conference 2001. | ||
|
|