Logo GISdevelopment.net

GISdevelopment > Proceedings > GITA > 1997


GITA 2002 | GITA 2001 | GITA 2000 | GITA 1999 | GITA 1998 | GITA 1997
Sessions

Advanced Technical Topics

Building & Supporting Applications

Business Evolution & Platform Migration

Expanding the User Base -- Non-Traditional Applications

From the office to the Field

Fundamental & Economic Issues of AM/FM/GIS

Lessons Learned

Major Technology Trends and their Impacts

Project Planning, Implementation and Management

Re-Engineering and Integration Issues

Scada and Real-Time Systems

User Project Presentations

Best of the Rest

Invited Presentation


GITA 1997


Advanced Technical Topics


The Data Warehouse


Inmon’s Twelve Rules

Rule 1: Warehouse data is logically and physically separate from operations data. The operations environment contains data that is needed to run the everyday operations of the business. The data warehouse contains data that is used to support strategic decision making. Warehouse data must be kept separate to prevent decision support queries from degrading performance of the real-time operations support systems.

An example of this occurs in the telco industry with loop assignment applications, like the Loop Facilities Assignment and Control System (LFACS). The application is designed as an operations system and works well to support loop assignments, but when facilities engineers get into the application to count cable ffls--a decision support function--the performance of the application is degraded. A warehouse could solve this problem by taking periodic summaries of cable fills that the engineers could access independently Ilom the real-time system.

Rule 2: Warehouse data is integrated. There must be a single, uniform representation of data throughout the warehouse, including aspects of element names, measurements, attributes, and physical definition. Integration is necessary for users to receive a uniform, coherent representation of the business. When data is kept redundantly in multiple applications there are inevitably variations in the form and content horn one system to another leaving users to wonder which is valid. Perhaps, they are all valid for the purposes of each specific application, but data that serves the enterprise must be valid for the whole. There cannot be ambiguities or inconsistencies.

Rule 3: Warehouse data is historical. This is not to suggest the warehouse is merely an archive of operations data. Rather it is selected data that represents a view of the business at points in time.

The warehouse is not intended to duplicate the real-time operations systems. It is not a central archive or backup of district databases. Such an archive maybe a valid requirement of a system, but that is not the purpose of a data warehouse.

Rule 4: Warehouse data is a “snapshot” view of the business at a particular point in time at which the data was relevant. As such, the warehouse data will not be updated. The warehouse may contain many such historical snapshots. Moreover, since warehouse data is intended to be aimed at decision support, it is unlikely to be a complete replication of operations systems data. Rather, it will be selected aggregate data taken from operations data.

As an example, the warehouse would not keep a complete inventory of each section of cable in the network, but it might be used to record a monthly summary of sheath-miles by gauge. It would not keep a record for each terminal in service, but it might keep a weekly record of how many of each type of terminal were in service.

Rule 5: Warehouse data is organized by subject along the guidelines of subject data areas without influence from applications or functions. Users who may not be familiar with the source application need to be able to browse and locate required data. Users are much more likely to be able to locate data stored by subject, e.g., “customers” or “products,” than that stored along the lines of an application or function they are not familiar with.

Restructuring data from its functionally orientated design to one of subject areas for the data warehouse can be challenging. Most data in the AM/FM/GIS would likely fall into the subject area of Installed Plant (which maybe further divided by the warehouse into plant classifications), but some data stewarded by the AM/FM/GIS system--for example, customer address--may fit better under the subject area classification, “Customer.” The data warehouse polling applications need to be designed to collect the data from the correct source and put it into the correct area.

Rule 6: Source data is operations data. “Under all normal conditions data is not directly entered or changed in the dda warehouse. ” [In.rrwn, 1994] Since the warehouse data represents a snapshot of selected operations data from a point in time, once a snapshot is taken it is unlikely that data will ever be changed unless it is discovered the data was not a true representation of the business at that time, and it was judged worthwhile to correct it to make a valid historical record.

Rule 7: Development life cycle is different for warehouse data Systems development life cycle is typically requirements-driven and the warehouse life cycle is data-driven. The warehouse is typically developed in an iterative fashion with each step building on the previous. Unlike “RAD’ or prototype development of systems, the fust warehouse iteration is usually a production database of a subset of the ultimate warehouse data.

Rule 8: Warehouse data must have a standard structure. A key design issue is that of granularity of the warehouse data. The data is typically structured in several levels of granularity. For example, the warehouse may contain current (most recent snapshot) detail data, historical detail data, lightly summarized data, and highly summarized data. These design criteria will have a large impact on the volume of data maintained and the performance of the warehouse.

Rule 9: Warehouse data technology is different. Operations systems are designed to support rapid, real-time transactions. The transaction architecture is usually kept small and the operations are supported by high-performance processing. A warehouse, on the other hand, typically requires handling huge volumes of data in large transactions without updates. Processing speed is less critical.

Rule 10: There must be only one “source of record.” Each data element must have only one operations systems source in order to maintain high-quality, integrated warehouse data. Identitlcation of the single source is a critical design criterion. As noted in Rule 2, operations data is likely to vary in format and content from one system to another. The warehouse designers must choose which operations system contains the values that will be suitable to the entire enterprise. That system will be the one and only source of record.

This rule is consistent with the enterprise data mandate that data be stewarded. The single stewarding application or database will also be the source of record for the warehouse. The warehouse itself will never be the source of record.

Rule 11: Warehouse data contains metadata. Metadata, i.e., data about data, is maintained as a part of the warehouse. This information includes:
  • Structure of data in the warehouse
  • Keys and attributes
  • Data sources and maps to the sources
  • Extract history
  • Aliases
  • Data relationships
  • Aggregation algorithms
The data warehouse must be accessible to developers across the enterprise. The simplest way for all groups to know the content of the warehouse and how to access it is to include the critical information within the warehouse itself.

Rule 12: There must be a chargeback structure for warehouse data. It is essential to demonstrate the value of the warehouse by tracking its usage and charging users accordingly. Tools that are considered infrastructure, with the cost spread across the enterprise, tend to be overdeveloped or “gold plated.” A chargeback process tends to limit the warehouse design to its most efficient functionality.

A concise definition that seems to accommodate Inmon’s 12 rules is: A data warehouse is a separate database of integrated subject area data designed specifically for use as a management decision support system and made up of snapshots of historical data derivedfrom operation systems.

The Five Elements
Haderle [1994] describes five elements that are essential in the design of a data warehouse. The fiist component of the complete data warehouse system is that of the operations data stores that may be in any of several operating environments. This is consistent with Inmon’s assertion that operations data is logically and physically separate from the data warehouse. This data, in “several operating environments” describes the legacy data environment precisely.

The second element is the access to a distribution network. The suitable network will support the delivery of data from the operations systems to the data warehouse, and, consequently, the information from the data warehouse to potential users across the entire company. In a large company, this will likely constitute a complex system of local area networks and wide area networks. The concept of enterprise data requires that any user needing information and having adequate security access be able to retrieve the information easily and quickly.

The third element of the warehouse is data delivery, the ability to move data from the operational sources to the data warehouse. This requires not only the network system described above, but a process and system for extracting and summarizing data from operations systems at the correct intervals.

A graphical user interface (GUI) front-end with the ability to locate available data is the fourth element. Here is an essential component for using the data warehouse as enterprise data that Inmon doesn’t mention. The warehouse must be accessible to all who need to use its data. This is a weakness of the stand-alone AM/FM/GIS in that the only people who can access it are the users of the application. Without direct access, if the CEO wants a summary of sheath-miles of cable, it’s necessary to call the engineer, or someone else with an application workstation, to retrieve and deliver it. With the warehouse, the CEO can retrieve the information directly.

The fifth and final component, according to Haderle, is that of end-user knowledge tools that provide decision support functionality. I would argue that the data warehouse is valuable as a repository of information even without additional knowledge tools. So long as the warehouse is designed and indexed adequately with a supportive GUI, users will be able to find and compile the information they need.

Page 2 of 3
| Previous | Next |

Applications | Technology | Policy | History | News | Tenders | Events | Interviews | Career | Companies | Country Pages | Books | Publications | Education | Glossary | Tutorials | Downloads | Site Map | Subscribe | GIS@development Magazine | Updates | Guest Book