Turning legacy into useable reality
Data Modeling
Logical Data Model Creation
An obvious predecessor to the migration of the Legacy Thesis data was the selection and
development of a target database/GIS in which to migrate the data. STW's objective was
not to simply migrate their Thesis data into a more modern software package but to
improve it. The creation of the target system is referred to as Data Modeling. In the early
stages of the UADMS project, a Logical Data Model (LDM) was developed to encompass
All of the mission-critical wastewater information that was to be contained in the GIS, and
to predict the future business needs relying on that data. Logical data models are built
without regard to any technical constraints that may exist in a specific database/GIS. A
LDM is essentially a representation of all business information and the relationships
found within this data.
A primary goal of the migration is to have the database/GIS product conform to the needs
of the business so that the business does not risk being forced to compromise their
operations based on product limitations. The STW wastewater LDM was initially created
using the Thesis GIS as the base to build from. All critical features, attributes, and their
valid values found in Thesis were carried over to the LDM. Additional assets, attributes,
map features, and their relationships were added to this LDM based on gaps found
between Thesis and the needs of the business. Thesis users, wastewater system experts,
and business stakeholders were all included in this stage of data modeling. Figure 1
shows a small section of an STW wastewater Entity Relationship Diagram (ERD).
Diagrams were created to depict node, line, and area features. In Figure 1, entities that
refer to STW assets are found within boxes; and arrows connecting these entities
symbolize the relationships between these assets. This section of the ERD shows the
relationships between Sewer Pipes and several node entities. These diagrams are very
useful tools for understanding a Logical Data Model.
(Figure 1)
Physical Data Model Creation
While the Logical Data Model (LDM) depicted all the information essential to STW's
wastewater asset management needs, it was just the first step to a final target
database/GIS. The process of migrating the STW wastewater L D M to a Physical Data
Model (PDM) involved two tasks: creating a physical database model and implementing
relationships through the use of a rulebase. First, the actual Oracle database that was to
be used with the UADMS GIS applications was designed by creating tables that
contained the spatial and attribute information for each entity in the LDM. These tables
were designed based on pre-defined LDM specifications such as attribute data types
(character, numeric, etc...), widths, and precision. The tables also contained lists of valid
values for those attributes.
For the majority of STW entities, a one-to-one relationship existed between LDM entities
and PDM tables. In STW's case, there are actually two PDM tables working in
conjunction for each entity; the business table contains the attributes and an SDE table
contains the geographic coordinates. These unique entity types are referred to in the GIS
as Feature Types. A few assets, such as inlets and outlets to a pumping station, were
represented as two unique entities in the LDM but could be joined into one table (Feature
Type) in the PDM due to matching attributes. Distinction between these two features
could be made based on attribute values. These attribute distinctions are referred to in the
GIS as Subtypes. Subtypes can also be found within tables storing only one unique LDM
entity. Gravity Sewers, for example, were all contained in one PDM table but were
broken into multiple subtypes that represented uniqueness in ownership and purpose. (ex.
Public-Foul-Gravity Sewer). Subtyping becomes important in the PDM because the
LDM describes the relationships at the subtype level.
Rulebase Development
Coinciding with the Oracle database, the UADMS PDM has data relationship rules
defined and controlled by a Rulebase. UADMS uses the ArcFM Rulebase, which works
in conjunction with the Oracle database to control many aspects of wastewater data. All
of the data validation checks native to Thesis that were deemed useful were built into the
UADMS PDM either through the database or rulebase. The functionality of the Thesis
validation checks was not only migrated but also vastly improved in UADMS. Thesis
validation included checks for null and invalid attribute values where they were not
already constrained by valid values called code lists.
Connectivity restrictions were built into the Rulebase to control and manage certain
situations. For example, certain node assets can connect to not only a Feature Type like
Gravity Sewer but also to the Subtype Private Foul Gravity Sewer. The rulebase controls
these relationships, allowing connectivity to be established between nodes that have a
pre-defined relationship in the rulebase, and disallowing connectivity between features
that have not been defined. The Rulebase, in fact, goes even farther by restricting the
maximum number of allowable connections between features. The rulebase also defines
many other rules governing such things as display, add routines, date stamping, and
parent/child polygon relationships.
Data Modeling Summary
The Data Modeling effort has been shown to provide a comprehensive migration of
Mission-Critical Legacy data and validation rules. By creating a Logical Data Model,
STW was able to eliminate the business shortfalls of their Legacy system from the new
system, as well as design a system that could be expanded in the future. The addition of a
rulebase to the data modeling procedure has vastly improved the functionality of the
Legacy system and will improve the quality of data that is added in the future.
Data Mapping
With an approved target data model in place, the next stage in migrating the legacy
wastewater GIS was determining how to move the Thesis data into the UADMS PDM. A
methodology called Data Mapping was used to step through all the Thesis dbase tables
and determine what columns were needed for migration, and how and where to fit them
into the UADMS data model. This mapping process can be discussed in two parts:
mapping of attribute data and mapping of graphic data.
The Thesis Data Mapping document is a spreadsheet containing multiple worksheets that
are required to fully map all the tables in Thesis to UADMS. A description of this
document and the individual worksheets follows.