Integration of Legacy, Cots, And Map Data
Discrepancies
Any data integration effort will identify discrepancies between legacy and map data.
Most discrepancies will fall into the following categories:
- Presence
Features exist in the map data, but not in the legacy data, or visa-versa.
- Attribution
Attribute values and/or formats differ between legacy and map data, i.e.,
AVENUE in legacy is stored as "AV." while the map stores it as "AVE."
- Location
Feature locations may differ between map and legacy data, i.e., legacy
describes a transformer at grid reference S-22 -BD35 and the map data
shows the transformer at S-22 -B035.
- Relationships
Similar to location, feature relationships may differ between the legacy
and map data, i.e., legacy describes a transformer on pole # 89702 and the
map data shows the transformer on pole #89703.
Maintenance
Once the obstacles listed above have been addressed and the data integrated, the newly
integrated data will require maintenance in order to assure that the value of the data is
sustained. In most cases, the historically used maintenance processes will have been
substantially impacted by the implementation of the new system, and the formerly clear
lines of data ownership will have been blurred by the integration.
COTS Data
The issues associated with the COTS data integration are very similar to the issues associated
with legacy data integration. One distinct difference lies with the maintenance of COTS data. If
COTS data is modified during the database construction, then ownership and of the COTS data
essentially transfers to the party who updated the data. If the objective is to purchase COTS data
and receive periodic updates from the provider, then no updates should be made to the COTS
data because periodic updates from the provider will result in location and accuracy
discrepancies when compared to the newly constructed GIS data. The decision not to modify the
COTS data, however, may significantly limit the degree of integration possible, or force the
inappropriate modification of map or legacy data.
Addressing Data Integration Issues
In order to develop an approach to addressing the data integration issues that have been
previously described, both the specific embodiment of each issue within the context of the
integration requirement, and the desired integration model must be considered.
Synchronization
As previously stated, in almost all cases, the maintenance and update lifecycles of the previously
independent systems were separate. The result of these separate lifecycles is asynchronous
database content. The first analysis of each synchronization issue must be the determination of
whether the asynchronous situation is a problem. In general synchronization will be required if,
within the context of the system, the valid representation of any feature or object requires data
from the two systems. Alternatively, if absolute synchronization of the database is not required,
but appropriate system utilization requires that "time-related" differences are identifiable, the
database design may require the incorporation of special attribution, and the applications may
require complex "feature state" fimctionality. If full synchronization is required, the process of
syncing datasets will require some very diligent effort. The specific synchronization process
cannot be defined without a specific requirement, but in general, the question of whether to
address this requirement prior to, during, or after creation of the geospatial database is always a
valid consideration.
Records Freezing
Orderly construction of a geospatial database almost always requires the "freezing" of source
data. In most cases, freezing of a map set can be achieved. While this freeze of map records is
not typically accomplished without some pain, the "mission critical" role of certain operational
systems may make freezing of a legacy dataset nearly impossible. The viability of freezing each
existing data system must be evaluated and the resulting conclusion must be factored into the
analysis of the synchronization issue described above since little value will be gained through a
synchronization process which is followed by unsynchronized freezing. If required, the freezing
process usually involves the creation of a "snapshot" through the copying of the dataset at a
specific time, and the "capture" of events or transactions which would normally have resulted in
the modification of the data. These captured events or transactions will require "posting" after
the relevant records are unfrozen.
Inteswation Match-Keys
As previously described, the method of matching records in each of the systems will need to be
defined. In most cases, common methods for identifying records in the existing data systems are
well established, well known, clearly presented, and unambiguous. In some cases, the keys to
matching of records in separate systems are interpretive, ambiguous, difficult to utilize, or simply
non-existent. As part of the integration requirements analysis, the match-keys and any peripheral
rules associated with each data type must be fully identified.