Spatial data warehousing "What business intelligence is There?"
Steve Van Ausdall
KEMA Consulting, Inc. (Formerly GeoIT, Inc.)
101 Inverness Drive East, Suite 130
Englewood, CO 80112
(303) 708-9355 x182
E-Mail: Svanausdall@kemaconsulting.com
Abstract
Data warehousing is an exciting information systems paradigm that enables analysis of
operational data from multiple systems across shared dimensions such as customers,
products, time, and employees. Application of these concepts in conjunction with
geospatial systems forges a powerful new dimension. Operational data is full of spatial
attributes such as addresses, coordinates, and administrative and political regions (city,
state, zip / area code, sales region, grid tile, distribution area, etc.) that can be leveraged
to analyze data in relation to geography. Analysts can employ thematic mapping
techniques to produce robust graphical depictions of key enterprise metrics such as sales,
outages, capacity, and utilization. Decision makers can then identify salient trends and
optimize business functions such as planning, marketing, and plant maintenance.
Spatial data warehousing concepts can also be utilized during complex system integration
projects involving consolidation of diverse spatial data sets. The warehouse is designed to
provide immediate business intelligence, while simultaneously aggregating data sets on a
new enterprise object model. Eventually, technologists can build applications against the
warehouse data. This approach provides many short and long term benefits, including
improved operational management, enterprise application integration, reduced capital
spending, increased productivity, and strategic advantage.
Traditional Data Warehousing Concepts
A data warehouse is a collection of data that is subject oriented, integrated, and timevariant.
The data within this environment supports a company’s decision-making needs,
enabling users to perform queries and analyze results.
Before creating a design for a data warehouse, it is critical to clearly understand the
architectural goals, types of users, needs of the users, and how the users will be
interacting with the data warehouse (2001, Browning and Mundy).
Al Kozakiewicz, in his Web article titled A Data Warehousing Primer for Database
Designers, explains, “The heart of any data warehouse is the dimensional schema,
sometimes referred to as a star”. As he illustrates in the sample model below, the
dimension and transform tables surround a central fact table. Each fact table represents a
different business metric to be analyzed, such as a sale, outage, or repair.
Dimension tables contain the “attributes associated with facts and separate these
attributes into logically distinct groupings, such as time, geography, products, customers,
and so forth” (Browning and Mundy). To perform a query, the user will typically select
the dimension rows that meet the desired selection criteria (e.g., Outages reported from
customers in the Northeast during 2001). The intersections (i.e. Cartesian product) of the
rows selected from the dimension tables with the fact table form the result set
(Kozakiewicz).
Kozakiewicz displays the following example as a sample model used to analyze product
sales by time period, region (store) or product line.
An operational metric is contained within each row of a fact table. Facts can be additive
along all dimensions, i.e., values can be summed independent of the dimension selected
to group the facts, and the results make sense. Examples of additive facts are
UnitsConsumed or MaintenanceRequests. Kozakiewicz explains that Semiadditive facts
can only be summed in certain dimensions; however, they can be plotted on a graph to
enable trend analysis.
Another consideration in the schema is the grain of the fact table. A user might choose to
have each row in the fact table represent an individual consumer transaction, daily,
weekly or monthly sales summaries. As a result, the finer the grain of the table, the larger
the warehouse will be. A business analysis should be performed before the design to
determine the level of granularity in order to meet the functional requirements of the
system (Kozakiewicz).