Turning legacy into useable reality
Neil Tansley
Severn Trent Water,Waterworks Road
Edgbaston, Birmingham B16 9DD
Albert Sarvis
Stoner Associates, P.O. Box 86
Carlisle, PA 17013
Introduction
Analysis of the Severn Trent Water wastewater data migration project can be viewed as a
case study in legacy system migration. The general strategies and methodologies used in
this project can be applied to a variety of legacy migration projects; and the details of this
migration are quite similar to other projects of its kind. The objective of this paper is to
provide an overview of the Severn Trent Water data migration project, the challenges to
migration, and the business implications inherent in legacy system migration. Discussion
of the initial stages of the migration project require a clear description of the legacy and
target systems data models and a thorough mapping of how data was moved between the
two models. An emphasis has been placed on retaining mission-critical data, system
functionality, and the customized rules found in the legacy system. Following explanation of these specifications, the actual translation stages and the validation of the converted data are described.
Project Background
Severn Trent Water (STW) is the second largest water utility in the United Kingdom
(UK). With an area of 22,000 square kilometers, the STW service area is located in the
Midlands of the country, surrounding the City of Birmingham. This massive data
conversion project was a key component of a £32 million pound (approx. $54 million)
GIS implementation project, which commenced in 1998. The project scope included
development and implementation of GIS applications and enhanced business processes
for more than 1,000 of STW's 4,500 employees. The scope of the data conversion
included both STW's service areas for clean water distribution (41,800 km) and
wastewater collection (52,300 km).
While the clean water data conversion used mostly paper map sources, STW wastewater
mapping was carried out using a proprietary AM/FM package called Thesis, licensed
from Oscar Faber Water (UK). The data in the wastewater mapping system had been
diligently maintained in Thesis since 1993 by 'agents' of STW and managed by local
government organizations. Thesis is a stand-alone, DOS-based GIS that uses Dbase files
to hold attribute data. STW and its 'agents' used all modules of the Thesis software
program in daily operation. These modules provided view, query, edit, plot and digitizing
functionality. The entire STW business relied on Thesis through the complete utilization
and integration; this provided a sizable challenge for system migration from volumetric,
technical, and business-process standpoints.
Project Challenges
There were many unique challenges to the Thesis migration project. One notable
challenge was the existence of stand-alone Thesis systems originally maintained by 68
individual 'agencies' and seven in-house STW operational sites. Translating from many
separate sources, even though they resided the same system, required finding the subtle
differences in the way data was maintained. These differences included use of non-valid
values, liberal use of a 'remarks' field to store key attributes, and shortcuts in digitizing.
A Several non-alphanumeric database fields were also a unique aspect of this project. All
coordinate data, and many other graphic description fields, used a form of binary storage
that required specialized coding from the software vendor. Subsequent to decrypting the
data, interpretation of the resulting values also proved difficult. A detailed discussion of
these challenges is included in the section on data modeling.
One of the more common challenges faced in any legacy system migration is the
existence of proprietary, undocumented data models. Thesis data was transferred from
one Thesis machine to another through an exported/imported ASCII file only readable by
Thesis. Additionally, migration of this data was carried out using the raw database files
and not the native transfer files. Crucial first steps included discovering in which tables
the attributes, graphic coordinates, and valid value lists were stored. Lack of a
documented data model meant that all of these tasks were performed blindly, requiring a
type of reverse engineering. Data model interpretation will be discussed later in the paper
with Data Model Development and Data Mapping.
Data Freeze and Change Tracking
An important business consideration for a legacy system migration is what to do once
data has been handed off to the migration vendor. This data must essentially be frozen at
a point in time. Business does not stop during this period and changes to the system
begin to accumulate. For traditional migration projects, change orders that arose during
migration either caused a tremendous backlog of work, or were literally hand-drawn onto
paper plots and then set aside until data was ready for updates.
In an ideal migration project, the ability to carry out a database comparison or use data in
a Records Management System would provide better tools to keep track of normal
changes. By comparing the legacy database to the target database, differences could be
detected and updated in the target system. In many legacy migration projects, however,
this comparison takes more effort than re-creating the data from scratch. . Another
method that can be considered is to use date stamp attributes on the legacy system to
extract changes that occurred after a defined date, and add those changes to the target
system following migration. Unfortunately, this option rarely exists in true legacy
migrations, and the STW project was no exception.
The STW solution sought to reduce the amount of data tracking necessary by minimizing
the time between freeze and delivery of migrated data. Since the real issue concerning
change tracking is the amount of data that accumulates over the elapsed time between
data collection and migration, arranging for quick turn around almost negates the need for
change tracking. The Thesis migration project adapted well to this technique because so
many separate systems existed. The files associated with the many users of Thesis were
relatively small and constituted neatly organized batches of data for migration. This
allowed the project team to work through the data within a few weeks of the data freeze
and delivery. More importantly, the small amount of change orders that were generated
during this short time could be held until data was redelivered in the target system. At the
time of delivery, changes could be handed over to the new centralized Records
Management Centers for updating.