The Data Warehouse
Inmon’s Twelve Rules
Rule 1: Warehouse data is logically and physically separate from operations data. The operations
environment contains data that is needed to run the everyday operations of the business. The data
warehouse contains data that is used to support strategic decision making. Warehouse data must
be kept separate to prevent decision support queries from degrading performance of the real-time
operations support systems.
An example of this occurs in the telco industry with loop assignment applications, like the Loop
Facilities Assignment and Control System (LFACS). The application is designed as an operations
system and works well to support loop assignments, but when facilities engineers get into the
application to count cable ffls--a decision support function--the performance of the application is
degraded. A warehouse could solve this problem by taking periodic summaries of cable fills that
the engineers could access independently Ilom the real-time system.
Rule 2: Warehouse data is integrated. There must be a single, uniform representation of data
throughout the warehouse, including aspects of element names, measurements, attributes, and
physical definition. Integration is necessary for users to receive a uniform, coherent representation
of the business. When data is kept redundantly in multiple applications there are inevitably
variations in the form and content horn one system to another leaving users to wonder which is
valid. Perhaps, they are all valid for the purposes of each specific application, but data that serves
the enterprise must be valid for the whole. There cannot be ambiguities or inconsistencies.
Rule 3: Warehouse data is historical. This is not to suggest the warehouse is merely an archive of
operations data. Rather it is selected data that represents a view of the business at points in time.
The warehouse is not intended to duplicate the real-time operations systems. It is not a central
archive or backup of district databases. Such an archive maybe a valid requirement of a system,
but that is not the purpose of a data warehouse.
Rule 4: Warehouse data is a “snapshot” view of the business at a particular point in time at which
the data was relevant. As such, the warehouse data will not be updated. The warehouse may
contain many such historical snapshots. Moreover, since warehouse data is intended to be aimed
at decision support, it is unlikely to be a complete replication of operations systems data. Rather,
it will be selected aggregate data taken from operations data.
As an example, the warehouse would not keep a complete inventory of each section of cable in
the network, but it might be used to record a monthly summary of sheath-miles by gauge. It
would not keep a record for each terminal in service, but it might keep a weekly record of how
many of each type of terminal were in service.
Rule 5: Warehouse data is organized by subject along the guidelines of subject data areas without
influence from applications or functions. Users who may not be familiar with the source
application need to be able to browse and locate required data. Users are much more likely to be
able to locate data stored by subject, e.g., “customers” or “products,” than that stored along the
lines of an application or function they are not familiar with.
Restructuring data from its functionally orientated design to one of subject areas for the data
warehouse can be challenging. Most data in the AM/FM/GIS would likely fall into the subject
area of Installed Plant (which maybe further divided by the warehouse into plant classifications),
but some data stewarded by the AM/FM/GIS system--for example, customer address--may fit
better under the subject area classification, “Customer.” The data warehouse polling applications
need to be designed to collect the data from the correct source and put it into the correct area.
Rule 6: Source data is operations data. “Under all normal conditions data is not directly entered
or changed in the dda warehouse. ” [In.rrwn, 1994] Since the warehouse data represents a
snapshot of selected operations data from a point in time, once a snapshot is taken it is unlikely
that data will ever be changed unless it is discovered the data was not a true representation of the
business at that time, and it was judged worthwhile to correct it to make a valid historical record.
Rule 7: Development life cycle is different for warehouse data Systems development life cycle is
typically requirements-driven and the warehouse life cycle is data-driven. The warehouse is
typically developed in an iterative fashion with each step building on the previous. Unlike “RAD’
or prototype development of systems, the fust warehouse iteration is usually a production
database of a subset of the ultimate warehouse data.
Rule 8: Warehouse data must have a standard structure. A key design issue is that of granularity
of the warehouse data. The data is typically structured in several levels of granularity. For
example, the warehouse may contain current (most recent snapshot) detail data, historical detail
data, lightly summarized data, and highly summarized data. These design criteria will have a large
impact on the volume of data maintained and the performance of the warehouse.
Rule 9: Warehouse data technology is different. Operations systems are designed to support
rapid, real-time transactions. The transaction architecture is usually kept small and the operations
are supported by high-performance processing. A warehouse, on the other hand, typically requires
handling huge volumes of data in large transactions without updates. Processing speed is less
critical.
Rule 10: There must be only one “source of record.” Each data element must have only one
operations systems source in order to maintain high-quality, integrated warehouse data.
Identitlcation of the single source is a critical design criterion. As noted in Rule 2, operations data
is likely to vary in format and content from one system to another. The warehouse designers must
choose which operations system contains the values that will be suitable to the entire enterprise.
That system will be the one and only source of record.
This rule is consistent with the enterprise data mandate that data be stewarded. The single
stewarding application or database will also be the source of record for the warehouse. The
warehouse itself will never be the source of record.
Rule 11: Warehouse data contains metadata. Metadata, i.e., data about data, is maintained as a
part of the warehouse. This information includes:
- Structure of data in the warehouse
- Keys and attributes
- Data sources and maps to the sources
- Extract history
- Aliases
- Data relationships
- Aggregation algorithms
The data warehouse must be accessible to developers across the enterprise. The simplest way for
all groups to know the content of the warehouse and how to access it is to include the critical
information within the warehouse itself.
Rule 12: There must be a chargeback structure for warehouse data. It is essential to demonstrate
the value of the warehouse by tracking its usage and charging users accordingly. Tools that are
considered infrastructure, with the cost spread across the enterprise, tend to be overdeveloped or
“gold plated.” A chargeback process tends to limit the warehouse design to its most efficient
functionality.
A concise definition that seems to accommodate Inmon’s 12 rules is:
A data warehouse is a separate database of integrated subject area data designed
specifically for use as a management decision support system and made up of snapshots of
historical data derivedfrom operation systems.
The Five Elements
Haderle [1994] describes five elements that are essential in the design of a data warehouse. The
fiist component of the complete data warehouse system is that of the operations data stores that
may be in any of several operating environments. This is consistent with Inmon’s assertion that
operations data is logically and physically separate from the data warehouse. This data, in “several
operating environments” describes the legacy data environment precisely.
The second element is the access to a distribution network. The suitable network will support the
delivery of data from the operations systems to the data warehouse, and, consequently, the
information from the data warehouse to potential users across the entire company. In a large
company, this will likely constitute a complex system of local area networks and wide area
networks. The concept of enterprise data requires that any user needing information and having
adequate security access be able to retrieve the information easily and quickly.
The third element of the warehouse is data delivery, the ability to move data from the operational
sources to the data warehouse. This requires not only the network system described above, but a
process and system for extracting and summarizing data from operations systems at the correct
intervals.
A graphical user interface (GUI) front-end with the ability to locate available data is the fourth
element. Here is an essential component for using the data warehouse as enterprise data that
Inmon doesn’t mention. The warehouse must be accessible to all who need to use its data. This is
a weakness of the stand-alone AM/FM/GIS in that the only people who can access it are the users
of the application. Without direct access, if the CEO wants a summary of sheath-miles of cable,
it’s necessary to call the engineer, or someone else with an application workstation, to retrieve
and deliver it. With the warehouse, the CEO can retrieve the information directly.
The fifth and final component, according to Haderle, is that of end-user knowledge tools that
provide decision support functionality. I would argue that the data warehouse is valuable as a
repository of information even without additional knowledge tools. So long as the warehouse is
designed and indexed adequately with a supportive GUI, users will be able to find and compile the
information they need.