GISdevelopment.net ---> GITA 1999 ---> Data Development and Evolution

Integration of Legacy, Cots, And Map Data

Gary S. Miller
Director - Project Implementation
Analytical Surveys, Inc., 741 N. Grand Avenue
Waukesha, WI 53186

M. Todd Rhodes
Project Implementation Analyst
Analytical Surveys, Inc., 741 N. Grand Avenue
Waukesha,WI53186


Introduction
The integration of legacy, commercially-available off-the-shelf (COTS) datasets and map data is a key component of most of today's GIS implementation initiatives. In the past, GIS databases were often constructed from a single map series. Today's GIS database construction requirements typically include the integration of data from one or more legacy systems with one or more map-type data systems. Additionally, the integration of COTS datasets has become an accepted manner of enhancing GIS database content and fi.mctionality.

Geospatial system implementers are faced with the need to determine how and when legacy or COTS data should be integrated with the map-born database. Thorough analysis and planning can be the single most important determinant of success in accomplishing this integration. This analysis and planning is especially important because there are numerous options with regard to the manner in which the data integration is accomplished, and because the data integration challenge should not be underestimated.

Evolution of Geospatial Systems
Historically, automated mapping, facilities management, and geographic information systems addressed single, departmental requirements and applications. Yesterday's GISS used distinct hardware and network configurations as well as GIS-specific programming languages. GIS databases were usually separate from other databases within the enterprise.

Today's advanced geospatial systems use standard hardware configurations which are integrated with enterprise-wide networks utilizing industry-standard programming languages. These systems are designed and implemented to serve multi-departmental or enterprise-wide needs. This new paradigm for geospatial systems both facilitates and requires the integration of large diverse datasets into a single database or several databases that seamlessly interact with each other.

Evolution of Geospatial Databases
Due to the limited functionality of yesterday's GIS systems, it was often appropriate to create a geospatial database from a single map series. A more aggressive implementation might have involved the integration of data from several map series'. While non-map datasets were sometimes utilized as a supplemental source for the population of GIS attributes, even in such cases, the previously separate databases were usually maintained separately even after portions of the datasets had been integrated within the GIS. This situation limited the utilization geospatial systems and the benefits of geospatial system implementation.

To derive maximum benefit from today's advanced geospatial systems, information from many sources and datasets must be integrated within a single database or several searnlessly interfaced databases.

Models for Integration
As previously stated, contemporary requirements for geospatial database content typically include substantial data integration. Several models for achieving the required level of database integration exist. The following subsections address these alternatives with regard to the integration of legacy data and COTS data with map data.

Simultaneous Re~lacement of LeEacv Database and Map Data
It is possible, even common, for a single geospatial database to fully replace legacy and map systems. It is also possible, though more challenging, to achieve that replacement in a simultaneous manner. This model offers several advantages with regard to short-term and longterrn database maintenance, and when feasible, maybe the most straightforward approach. In many cases, however, the critical needs served by the legacy system may make this approach to database construction inappropriate.

Initial Replacement of Maps, Subsequent "Cut-over" of Legacy Svstem
As stated above, the replacement of legacy and map data with a single geospatial database is not an uncommon model. In some cases, however, it is not possible to achieve simultaneous replacement of the datasets. One approach to addressing such a situation is to continue to operate and maintain the legacy system during the database construction process, and execute a "cut-over" to use of the newly constructed geospatial system on a predetermined date, probably shortly after database construction is completed.

Replace Maps. Retain Legacy System, Establish "Real-time" Interface
A third approach to database integration is to implement the geospatial system with the objective of replacing the map system, but to retain the legacy system. If, however, the fimctionality of the either system is dependent on access to data resident in the other, a high-level of data compatibility and an interface between the systems will be required. If the functionality of the system further depends on access to absolutely current data resident within the other system, then the interface will need to provide "real-time" access to the other database.

Replace Maps, Retain Legacy System, Establish "Re~ort-tYpe" Interface
A somewhat less aggressive approach to interface implementation, which is viable in situations where system functionality does not demand access to completely current data from the other system, is a "report-type" interface. In this scenario, database content, or database transactions, are periodically reported to the other system.

Load COTS Data to Geospatial Database with. or without, Modification
As with legacy data, COTS data can be integrated with map data via loading both data sets to the same geospatial database. A further option, which has significant data maintenance ramifications, relates to the modification of the COTS data or use of it "as is".

Load COTS Data to a Separate Database, Establish Interface
As with legacy data, the alternative model for COTS database integration is to load the data to a database which is separate from the main geospatial database while facilitating data access through establishment of an interface. In such a situation, the interface is likely to require "readonly" functionality.

Understanding Integration Issues
As previously stated, within the context of a geospatial system, there are several models for the integration of legacy and COTS data with map data. The following subsections discuss the issues associated with integrating legacy and COTS data during geospatial database construction.

Legacy Systems
There are a number of issues which should be taken into account when considering the integration of legacy and map data within a geospatial system. While these descriptions are generic in nature, the concepts are broadly, perhaps universally, applicable. Examination of these issues is an essential component of a credible data integration requirements analysis. Further, the results of the analysis of these issues will be a valuable asset during the design and accomplishment of the data integration process.

Data Synchronization
While determining how legacy data will be integrated into a GIS, one must determine the processes associated with updating separate map and legacy systems. Typically the disparate data sets are out-of-sync because they have each been updated through separate, perhaps independent, processes within the enterprise.

Data/Source Freezing
During construction of a GIS database, it is required that sources, whether legacy or map, be frozen. Freezing of data sources results in a stable source for use in database construction. During these freeze times, daily operations of the enterprise must proceed without hindrance. Therefore, special processes must be utilized to allow some manner of update of the source data systems while protecting the stability of the frozen data sources. Addressing this conflict of needs often requires copying or archiving of source data, special capture of data changes, and the updating the geospatial database with backlog data once the database construction has been completed.

Integration Match-Keys
In order to successfully integrate legacy and map data, feature-level match-keys are required. Existing information such as ID numbers, grid references, coordinates, street names and addresses are all potential match-keys which can be used to link map and legacy data.

If no match-keys exist, a process of assigning match-keys maybe required. This can either be performed prior to database construction, or as part of the database construction methodology.

Discrepancies
Any data integration effort will identify discrepancies between legacy and map data. Most discrepancies will fall into the following categories:
  • Presence Features exist in the map data, but not in the legacy data, or visa-versa.
  • Attribution Attribute values and/or formats differ between legacy and map data, i.e., AVENUE in legacy is stored as "AV." while the map stores it as "AVE."
  • Location Feature locations may differ between map and legacy data, i.e., legacy describes a transformer at grid reference S-22 -BD35 and the map data shows the transformer at S-22 -B035.
  • Relationships Similar to location, feature relationships may differ between the legacy and map data, i.e., legacy describes a transformer on pole # 89702 and the map data shows the transformer on pole #89703.
Maintenance
Once the obstacles listed above have been addressed and the data integrated, the newly integrated data will require maintenance in order to assure that the value of the data is sustained. In most cases, the historically used maintenance processes will have been substantially impacted by the implementation of the new system, and the formerly clear lines of data ownership will have been blurred by the integration.

COTS Data
The issues associated with the COTS data integration are very similar to the issues associated with legacy data integration. One distinct difference lies with the maintenance of COTS data. If COTS data is modified during the database construction, then ownership and of the COTS data essentially transfers to the party who updated the data. If the objective is to purchase COTS data and receive periodic updates from the provider, then no updates should be made to the COTS data because periodic updates from the provider will result in location and accuracy discrepancies when compared to the newly constructed GIS data. The decision not to modify the COTS data, however, may significantly limit the degree of integration possible, or force the inappropriate modification of map or legacy data.

Addressing Data Integration Issues
In order to develop an approach to addressing the data integration issues that have been previously described, both the specific embodiment of each issue within the context of the integration requirement, and the desired integration model must be considered.

Synchronization
As previously stated, in almost all cases, the maintenance and update lifecycles of the previously independent systems were separate. The result of these separate lifecycles is asynchronous database content. The first analysis of each synchronization issue must be the determination of whether the asynchronous situation is a problem. In general synchronization will be required if, within the context of the system, the valid representation of any feature or object requires data from the two systems. Alternatively, if absolute synchronization of the database is not required, but appropriate system utilization requires that "time-related" differences are identifiable, the database design may require the incorporation of special attribution, and the applications may require complex "feature state" fimctionality. If full synchronization is required, the process of syncing datasets will require some very diligent effort. The specific synchronization process cannot be defined without a specific requirement, but in general, the question of whether to address this requirement prior to, during, or after creation of the geospatial database is always a valid consideration.

Records Freezing
Orderly construction of a geospatial database almost always requires the "freezing" of source data. In most cases, freezing of a map set can be achieved. While this freeze of map records is not typically accomplished without some pain, the "mission critical" role of certain operational systems may make freezing of a legacy dataset nearly impossible. The viability of freezing each existing data system must be evaluated and the resulting conclusion must be factored into the analysis of the synchronization issue described above since little value will be gained through a synchronization process which is followed by unsynchronized freezing. If required, the freezing process usually involves the creation of a "snapshot" through the copying of the dataset at a specific time, and the "capture" of events or transactions which would normally have resulted in the modification of the data. These captured events or transactions will require "posting" after the relevant records are unfrozen.

Inteswation Match-Keys
As previously described, the method of matching records in each of the systems will need to be defined. In most cases, common methods for identifying records in the existing data systems are well established, well known, clearly presented, and unambiguous. In some cases, the keys to matching of records in separate systems are interpretive, ambiguous, difficult to utilize, or simply non-existent. As part of the integration requirements analysis, the match-keys and any peripheral rules associated with each data type must be fully identified.

Discrepancies
Since records systems have been maintained by human beings, potentially using imperfect systems, even in a situation where synchronization, freezing, and match-keys were not problems, discrepancies between the data in the systems will exist. These discrepancies will be identified through a sound data integration process. The findarnental question then is what to do when discrepancies are identified during database construction. The following options essentially describe the options.
  • Resolve each Discrepancy on a Case-by Case Basis, Report the Resolution
  • Resolve each Discrepancy on a Case-by Case Basis, Do Not Report the Resolution
  • Resolve Discrepancies based upon Standard Rules, Report the Resolution
  • Resolve Discrepancies based upon Standard Rules, Do Not Report the Resolution
  • Report Discrepancies
  • Ignore Discrepancies
Determination of the most appropriate strategy for dealing with discrepancies must be carefully and thoroughly considered. Discrepancy resolution, either during or after database construction may be a very time-consuming and expensive undertaking. Conversely, ignoring discrepancies may be inappropriate.

Data Maintenance
The maintenance of data during and after its use in the construction of the integrated geospatial database will be significantly impacted by the implementation of system. In a well-designed system, the data maintenance processes will be more consistent, faster, and more efficient than its predecessor processes. This gain however, is only achievable through some pain.

During the database construction and data integration processes special data maintenance procedures may be required, alternatively it maybe possible to postpone data maintenance during the freeze period, postponing the maintenance effort and creating a "backlog" of records posting activities. Additionally, in cases where synchronization of existing datasets prior to integration was not feasible, special data maintenance activities aimed at achieving synchronization after database construction may be required. Optional approaches to addressing this database maintenance requirement include the inclusion of some backlog posting process within the database construction process, and the limited-term use of additional records posting personnel.

The successful long-term use of the geospatial system and its component datasets will require that all data maintenance requirements are addressed and that all data maintenance activities are well understood. Gaining this understanding and developing appropriate procedures will require the detailed assignment of data maintenance responsibilities and the thorough analysis and design of data maintenance workflows.

Conclusion
The integration of disparate datasets is an essential requirement of geospatial system implementation. Many implementers have underestimated the data integration challenge and under-invested in the analysis of data integration issues prior to beginning geospatial database construction. Successfidly addressing data integration challenges is a critical sub-task of successful geospatial database construction.

While the challenge is often substantial, the issues are predictable and can be addressed through appropriate consideration and procedure development.
© GISdevelopment.net. All rights reserved.