The Data Warehouse
William R. Donaldson
Senior Consultant, Convergent Group, 6200 South Syracuse Way, Englewood, Colorado 80111
Abstract
“A data warehouse is a subject-oriented integrated non-volatile, time variant collection ofdata
suited to the needs of management. ” [Inmon, 1994] This paper will take that definition apart
and describe how the data warehouse can provide decision support information and contribute to
the integrated solution of AM/FM/GIS in the enterprise data environment. The data warehouse is
more than a central copy of system data and less than a cure-all. The data warehouse can be an
effective and efficient means of delivery of the information created and maintained by the
AM/FM/GIS. It can be the solution that makes the difference between success and failure of a
project. The paper will illustrate the technical and economic factors to be examined when
considering a data warehouse in the project architecture. It examines the details of the 12 criteria
defining what the data warehouse is (and is not), explains the five elements of data warehouse
architecture, and concludes with a review of the potential hazards and measures of success.
The Enterprise Environment
“The biggest mistake any company can make is to promise that one of the GIS benefits will be
that it will help reduce personnel. “ [Muench, 1996]
In a large utility company, AM/FM/GIS may not be cost effective as a stand-alone application.
This assertion was considered almost heretical a few years ago, but more and more we are seeing
projects falling short of projected benefits. The most commonly seen justification for
implementing an AM/FM/GIS is that it will reduce cost by eliminating people. Planners project
improvements in productivity and related savings to be 30 percent or more for a stand-alone
system. That makes for a pretty attractive return on investment. Yet, when we look at actuals
from systems that have been installed for a year or more, we see few, if any, headcount
reductions.
On the other hand, the case is being made that the information kept in the AM/FM/GIS is
incredibly valuable, not only to facilities engineers, but to the entire company-but only if it’s
available in a usefhl form to everybody who needs it. It’s not the AM/FM/GIS application that
has value to company; it’s the information the system maintains that has value. Or to paraphrase
a catch-phrase from the 1992 presidential campaign, “It’s the data, stupid.” Experts are coming to
realize how the value of the information multiplies many times over when it can be shared across
the company.
There are four architectures that can be used to satisfy the needs of enterprise data:
- Central Database
- Distributed Databases
- Federated Databases
- Data Warehouse
Each of these architectures has advantages and disadvantages that should be thoroughly
understood by the system architects to be used individually or in combination to create the
optimal company-wide system environment.
Most systems today use the central database architecture, either as a single corporate database,
or multiple, stand-alone databases located in districts, Distributed databases are defined as
multiple, homogeneous databases connected by a network and having data services that direct the
application to the correct location for specific data. Few AM/FM/GIS systems today use this
architecture.
Federated databases are similar to distributed databases except that they are heterogeneous and
thus more suited to integration with legacy systems however, the difficulty and cost of integrating
federated databases are enormous.
Pair-wise interfaces between systems and databases do not constitute distributed or federated
databases and are a poor substitute for enterprise data architectures.
Enterprise data is that which meets the following criteria
- Data is separate from applications
- Redundancy is managed
- Data is shareable
- Data is independent from supplier products
- Data is stewarded
- Access is open and documented
The data warehouse has the advantage of being relatively easy to steward and to manage
redundant y. It is inherently independent of applications, and the data is shareable. Moreover, the
data warehouse is (relatively) easy to integrate with legacy systems. Perhaps its greatest
advantage, however, is that it can be designed and implemented in stages, with each stage
returning its benefit as soon as it is implemented. Unlike other components of a major system, it
does not have to be complete before it starts returning value. The data warehouse must be
carefidly designed, however, as once built it tends to be inflexible.
The Data Warehouse Defined
The definition of a data warehouse seems to be quite simple, ye~ as Hackathorn [1993] says,
“there has been much confusion and even controversy over what constitutes a warehouse.”
Hackathom attempts to define the subject as “a collection of data objects that have been packaged
and inventoried for distribution to a business community.” This definition seems too generic and
ambiguous in describing a warehouse and how it differs from other data stores. For example, his
deftition could describe a simple database extract.
We are inclined to ask, so what’s the big deal with definitions so long as it stores my data? In
fact, the data warehouse is as different from other data stores as a database is from a flat file. It’s
critical to understand the differences, at least at the conceptual level, if we are to communicate
requirements and design options among users, designers, and suppliers, and to make good choices
in our project planning.
Bill Inmon, the self-proclaimed “Father of the Data Warehouse,” describes 12 rules of the data
warehouse that pin it down in greater detail and clarity.