GISdevelopment.net ---> GITA 2003 ---> Mobile

Spatial data warehousing "What business intelligence is There?"

Steve Van Ausdall
KEMA Consulting, Inc. (Formerly GeoIT, Inc.)
101 Inverness Drive East, Suite 130
Englewood, CO 80112
(303) 708-9355 x182
E-Mail: Svanausdall@kemaconsulting.com


Abstract
Data warehousing is an exciting information systems paradigm that enables analysis of operational data from multiple systems across shared dimensions such as customers, products, time, and employees. Application of these concepts in conjunction with geospatial systems forges a powerful new dimension. Operational data is full of spatial attributes such as addresses, coordinates, and administrative and political regions (city, state, zip / area code, sales region, grid tile, distribution area, etc.) that can be leveraged to analyze data in relation to geography. Analysts can employ thematic mapping techniques to produce robust graphical depictions of key enterprise metrics such as sales, outages, capacity, and utilization. Decision makers can then identify salient trends and optimize business functions such as planning, marketing, and plant maintenance. Spatial data warehousing concepts can also be utilized during complex system integration projects involving consolidation of diverse spatial data sets. The warehouse is designed to provide immediate business intelligence, while simultaneously aggregating data sets on a new enterprise object model. Eventually, technologists can build applications against the warehouse data. This approach provides many short and long term benefits, including improved operational management, enterprise application integration, reduced capital spending, increased productivity, and strategic advantage.

Traditional Data Warehousing Concepts
A data warehouse is a collection of data that is subject oriented, integrated, and timevariant. The data within this environment supports a company’s decision-making needs, enabling users to perform queries and analyze results.

Before creating a design for a data warehouse, it is critical to clearly understand the architectural goals, types of users, needs of the users, and how the users will be interacting with the data warehouse (2001, Browning and Mundy). Al Kozakiewicz, in his Web article titled A Data Warehousing Primer for Database Designers, explains, “The heart of any data warehouse is the dimensional schema, sometimes referred to as a star”. As he illustrates in the sample model below, the dimension and transform tables surround a central fact table. Each fact table represents a different business metric to be analyzed, such as a sale, outage, or repair. Dimension tables contain the “attributes associated with facts and separate these attributes into logically distinct groupings, such as time, geography, products, customers, and so forth” (Browning and Mundy). To perform a query, the user will typically select the dimension rows that meet the desired selection criteria (e.g., Outages reported from customers in the Northeast during 2001). The intersections (i.e. Cartesian product) of the rows selected from the dimension tables with the fact table form the result set (Kozakiewicz).

Kozakiewicz displays the following example as a sample model used to analyze product sales by time period, region (store) or product line.


An operational metric is contained within each row of a fact table. Facts can be additive along all dimensions, i.e., values can be summed independent of the dimension selected to group the facts, and the results make sense. Examples of additive facts are UnitsConsumed or MaintenanceRequests. Kozakiewicz explains that Semiadditive facts can only be summed in certain dimensions; however, they can be plotted on a graph to enable trend analysis.

Another consideration in the schema is the grain of the fact table. A user might choose to have each row in the fact table represent an individual consumer transaction, daily, weekly or monthly sales summaries. As a result, the finer the grain of the table, the larger the warehouse will be. A business analysis should be performed before the design to determine the level of granularity in order to meet the functional requirements of the system (Kozakiewicz).

A powerful technique in data warehousing is drill-down and drill-up. This process, from a design standpoint, involves adding or subtracting row headers to results for a more or less refined query. For example, a query is set to for unit sales, grouped by region. In analyzing the results, the user notices that sales in the Northeast are off 20% over the same time period last year. By adding additional attributes in the store dimension (i.e., in this example, metropolitan area markets within regions) the user might discover that the reason for the change is that sales are off in a single metropolitan area.

One way to extract significant query performance is through aggregations. This involves examining the way in which facts are commonly reported and adding tables, which are updated on a periodic basis, that pre-summarize data along one or more dimensions. For example, if the grain of our fact table in the example above was daily, and we know that reporting by month and quarter is done regularly, we might add additional fact tables with the grain reduced to monthly and quarterly totals, respectively (Kozakiewicz).

Spatial Data
While traditional warehouses incorporate the spatial dimension in the form of sales regions, stores, and customers, the relationships between them are not typically exploited. Powerful spatial query techniques found in GIS systems such as finding customers or network equipment within an area can be used to map disparate operational data onto dimensions not previously possible. In addition, thematic mapping techniques can be used to plot metrics on maps, resulting in the ability to easily spot previously unrevealed geographically related trends.

Examples of thematic mapping appear below.


Figure 1 -- Thematic Mapping: Sales Data


Figure 2 -- Thematic Mapping: Population Density

Spatial Data Warehousing for analysis of enterprise metrics

http://www.microsoft.com/presspass/guides/mappoint2002/downloads/pictureanykindofdatamap.tif
http://www.microsoft.com/presspass/guides/mappoint2002/downloads/understandyourbusiness.tif


Traditional data warehousing can be applied to spatial data, simply by tracking operational metrics about the GIS data. Examples would be storing inspections, trouble calls, disconnects, leaks, and other network metrics in fact tables. The GIS can be used to classify each fact into the correct region or regions, using “within polygon” query capabilities. Attributes about the network elements could be stored in “product” type dimension tables, such as transformer, conductor, pipe, or customer. By integrating the functionality of a GIS with the reporting and analysis capabilities of a data warehouse, regions can be more easily changed or updated, and facts can be driven from the application functionality for storage in the warehouse.

Some examples of potential analysis and reporting capabilities are:
  • Plot leaks or outages by region and timeframe thematically using colored regions.
  • Report number of inspections against problems found by inspector zone over time on a chart within each region.
  • Display sales against marketing initiatives on a demographic map.
  • Show customer complaints by region against number of outages.
Another possibility is the ability to combine multiple GIS and other operational system data into a single browser. Middleware can be used to pass messages from source systems to update dynamic information in real-time, resulting in an enterprise asset viewer and reporting tool. Layer controls can be used to show operational states, such as outages, or any other report available in the data warehouse. Thematic mapping capabilities of spatial viewers can be implemented to color regions based on the results of the reports.

Spatial Data Warehousing for System Integration
Spatial Data Warehousing is a powerful system integration paradigm that can pave the way forward for companies facing difficult integration problems involving multiple spatial platforms, and diverse data sets and formats. The warehouse is built alongside existing systems, and implemented and maintained in parallel with existing initiatives so as not to disrupt production. The warehouse is designed to provide immediate business intelligence and traditional warehousing benefits, while at the same time aligning diverse data sets on a common robust enterprise platform. Eventually, applications are enhanced to use the warehouse as the source of record, allowing direct updates with multiple application concurrency control and version management. This approach provides many short and long term benefits, including improved management, integration, performance, automation, stability, and security.

Implementation Specifies
First, a Spatial Data Warehouse (SDW) is created, using one of many GIS server storage products. (Oracle Spatial, ESRI SDE, Autodesk AGDS, etc.) The SDW allows read only access (view / query) to applications, and is not the source of record for any dataset except the operational metrics. Many off the shelf tools can be used to create simple data mover routines for many data formats, and can be customized and enhanced as needed. Unique dataset identifiers are created on the initial load of each dataset, and those IDs are stored as part of the primary key for each row in the SDW. Alternatively, a composite primary key can be built from the GIS key and a dataset id. The dataset id allows each dataset to be refreshed, by deleting each row having the key and importing a new copy of the source data.

Initially, the data movers for each dataset format simply delete all warehoused data for the dataset, and re-import it. For example, each file (dwg, shp, etc.) is a single dataset, as is each continuous database dataset. Refreshing is controlled by an ongoing server scheduler process, or by a manual “as-needed” trigger.

This approach works as long as refreshes are done infrequently, since each full refresh involves moving every data element. As more accurate and up-to-date information is desired, "modified" flags are built into the source apps, so that refresh only has to move modified data, and can be done more frequently. This could also be implemented using triggers, time stamps, or many other similar constructs.

After implementation of the warehouse, business intelligence reporting and viewing applications can be built upon the new SDW dataset to produce enterprise-wide spatially enabled queries and reports, facilitate analysis needed to enter new markets, exit unprofitable markets, and focus R&D on the most profitable products. Eventually, applications are written or enhanced to use the warehouse as the source of record. Concurrency controls are exposed in the integration layer to ensure consistency across concurrent updates by multiple applications. Version management / long transactions are also implemented in the SDW layer to allow proposed changes to flow through a workflow sequence, and to enable automated posting and undo.

A common Reality
Many large companies today are faced with multiple diverse applications and datasets, little or no integration between systems, and little or no ability to query or analyze the data as a whole.


Incremental Steps Forward First, build interfaces from existing systems to a common SDW schema, prioritizing development based on desired integration timeframes and business intelligence (BI) analysis requirements.


The Long Term Goal
Over time, legacy datasets can be converted into the desired system to enable full integration, automation, concurrent update, long transactions, and many other costreduction benefits for desired applications. Enhance existing applications or implement new systems to operate on the new database directly, or through a middleware layer.

Summary
The combination of concepts from data warehousing and spatial databases and GIS systems can produce many exciting and valuable applications. Spatial technologies are becoming easier to use and more interoperable, resulting in faster integration timelines and more dynamic, modular, extensible systems. Benefits can include:
  • Up-to-the-minute status of the entire enterprise, providing the ability to optimize processes and head-off potential problems
  • Ability to report on historical operational metrics across multiple dimensions such as time, region, equipment, customers, employees, and more, providing the ability to identify trends and capitalize on opportunities
  • Short-term business intelligence and analysis capabilities.
  • Long-term system integration benefits including enterprise-wide data model, reduction of number of redundant systems and development staff.
References
Browning, Dave and Mundy, Joy, December 2001, Data Warehouse Design Considerations, http://msdn.microsoft.com/library/enus/ dnsql2k/html/Sql_DWDesign.asp?frame=true

Kozakiewicz, Al, A Data Warehousing Primer for Database Designers, http://www.hourglass.com/dwprimer.htm

Micorsoft, 2001
http://www.microsoft.com/presspass/guides/mappoint2002/downloads/pictureanykindofd atamap.tif
http://www.microsoft.com/presspass/guides/mappoint2002/downloads/understandyourbu siness.tif
Permission to use Documents (such as white papers, press releases, datasheets and FAQs from the Services is granted, provided that the below copyright notice appears in all copies and that both the copyright notice and this permission notice appear

© 2001 Microsoft Corporation, One Microsoft Way, Redmond, Washington 98052-6399 U.S.A. All rights reserved.

© GISdevelopment.net. All rights reserved.