Spatially enabling high-end business applications
Data Warehousing: High-end Business Analysis Tool
Data warehousing has quickly taken center stage in the information technology
infrastructures of many organizations. A data warehouse is a collection of data in
support of management's decisions. It is first and foremost, an ana/ytica/ tool
oriented around the major subjects of the enterprise, e.g., customer, vendor, product
and productivity. It focuses on data modeling and database design exclusively, and it
excludes data that is not useful for decision support processing. In contrast, the
operational (OLTP) applications are designed around processes and functions (e.g.,
loans, savings, bank card and trust transactionsfor a financial institution).
Organizations are adopting data warehouse technology in response to an evershifting
competitive landscape. Data warehousing is now a $17billion global market
and growing 10-15 percent annually. According to META Group of Westport,
Connecticut, by early 1996, over 90?40of large companies had implemented or were
planning to develop data warehouses. Hardware, software, and consulting vendors
have quickly developed numerous new products and services that specifically target
this rapidly expanding market segment. In short, successful organizations have
recognized the benefits of deploying data warehousing to address customer
demands and market forces with great speed and dexterity.
Data Warehousing and On-line Transaction Processing (OLTP)
Corporate applications are essentially optimized for OLTP (On-line Transaction
Processing) applications. OLTP transactions are characterized by fast access times,
no end user customization, and multiple simultaneous user access. The majority of
legacy systems are OLTP based and consequently, the data resides in a variety of
data sources. The data sources can be in flat files, hierarchical databases (like IMS),
network structured databases (like IDMS), inverted list databases (like DATACOM/
DB), relational systems (like Oracle) or more commonly a combination of the above.
Businesses need to access the wealth of information that is stored in these data
sources, collate the data and determine trends. Organizations that have been able to
obtain such data are naturally better able to base decisions and strategy on past
historical trends, and in doing so, gain competitive advantage. The problem is data
stored across multiple databases and repositories is inconsistent, incomplete and
optimized for task-dependent OLTP transactions. The data structure itself, does not
lend itself to analysis and data query. In addition businesses cannot allow the
production systems to be exposed to massive queries and statistical computations
that might reduce their performance.
The requirement is simple: collate data from these multiple sources into one logical
database that is optimized for OLAP (On-line Analytical Processing) transactions and
allow knowledgeable data users to access this data. This is what is referred to as a
data warehouse. The notion of data warehousing is not a new idea but is rather the
formalization of lessons learned in Decision Support Systems (DSS) and the growing
demand for historical data to be analyzed and manipulated by end users with little or
no IS involvement.
An important difference between the data warehouse and operational applications is
how they model data. Operational data maintains an ongoing relationship between
two or more tables based on a business rule that is in effect. In contrast, data
warehouse content spans a spectrum of time, maintains many relationships, and
represents many static business rules (and correspondingly, many data
relationships) between two or more tables. In other words, all data in the data
warehouse is "time variant", i.e., accurate as of some moment in time, whereas in
the operational environment data is accurate as of the moment of access. The time
horizon represented for the data warehouse is much longer (which can involve
years) than that for the operational environment (which ranges from the current
values of today to ninety days). Every key structure in the data warehouse contains
an element of time either implicitly or explicitly. Lastly, the data warehouse is
nonvolatile. Data warehouse data is a long series of snapshots, and cannot be
updated once correctly recorded, while record-to-record real-time updates -- inserts,
deletes, and changes -- are done regularly to the operational environment. That is,
once data is loaded into the warehouse from the application-oriented operational
environment (and/or external sources), it does not change, but is merely accessed
there. Therefore, there is no need to be cautious of the update anomaly, an
important factor to consider in operational application systems; nor does data
warehousing require the complex technologies supporting backup and recovery,
transaction and data integrity, and the detection and remedy of deadlock. Data
"updating" in the data warehousing environment consists of periodic mass loading of
data from the operational environment. The simplicity of data management and the
much less rigid response time requirements allow data warehouse designers to take
liberties in optimizing the access of data. De-normalization of the physical data
model is conducted to enhance performance and simplicity, which are more
prominent for data warehouse operations because the amount of data involved is
typically very large.
Table 2 Comparison of OLTP and OLAP activities
| OLTP |
Data Warehouse(OLAP) |
Order Processing Product Fulfillment
Collections
Infrastructure Investments
Accounts Receivable
Accounts Payable
Logistics
Mobile/Fleet Management
Emergency Notification
Workflow Management
Customer service operations |
Marketing campaigns
Pricing/Promotions
Customer Churn Analysis
Profitability
Network Planning
Market Segmentation
Product Marketing
Demographic Analysis
Sales Analysis
Fraud Detection
Customer Info Management
|