Logo GISdevelopment.net

GISdevelopment > Proceedings > GITA > 1999


GITA 2002 | GITA 2001 | GITA 2000 | GITA 1999 | GITA 1998 | GITA 1997 |  
Sessions

Business Applications

Data Development and Evolution

Data Distribution and Access

Engineering and Design Applications

Enterprise Integration

Enterprise Resource Planning

Exploiting Field and Mobile Technologies

Invited Track

Operations Support

People Issues

System Architecture

User Perspectives

Work Management


GITA 1999


Business Applications


Spatially enabling high-end business applications


Data contained in a data warehouse is integrated in a singular and consistent structure. Considerable effort must be undertaken in coordinating the acquisition, transformation, and consolidation of information from various departmental and external sources. Because data from disparate application platforms is likely to be inconsistent, data scrubbing is required for generating credible and consistent results. This task is non-trival and generally involves the greatest cost. Implementing consistent naming conventions, measurement of variables, encoding structures, physical definition of attributes are challenging organizational issues, but are key to successful warehouse creation. For example, coming to terms with a consistent customer naming and addressing conventions are real challenges where multiple legacy systems are in place. Further, the integration of new data types, such as spatial, image, and time series must be suppoft at the data server tier to support analysis. As data is loaded from an operational environment or some external source, it is transformed and then goes into the current detail level, where it is used in the calculation of the lightly summarized data and the highly summarized data or in some ad hoc queries.

Data Warehouse Query Processing
In the area of query processing, a quick comparison against transaction processing applications is useful. Typical OLTP transactions are predictable and each unit of work is small, accessing or changing just a few rows. The real challenge is in terms of performing hundreds, perhaps thousands, of these well-defined, relatively small units of work concurrently. In contrast, decision support queries usually involve large amounts of data and thus each unit of work is typically much larger. There may not be quite as many concurrent users, although that is starting to change. Further, typical data warehousing queries involve complex operations such as multi-table joins, sorting, and aggregation. Often these operations are set-oriented, operating on groups of records meeting a specified criteria, as opposed to the record-level operations of transaction processing.

Another main difference relates to the organization of the data being accessed. Unlike OLTP systems that employ transaction-centric modeling focused on efficiency, warehouses and data marts typically utilize query-centric dimensional models such as star schemas that facilitate easy visualization and analysis. And finally, unlike transactional systems, where SQL is carefully hand-written and pre-tuned by developers, decision support queries for the most part are not even written by humans. They are dynamically generated by end-user tools, posing some unique challenges for the query optimizer. These differences translate into a need for specialized techniques in the database server, in every aspect of query processing query optimization, access and join methods, and query execution.

Query Processing
The database server needs to provide parallel query technology to enable efficient execution of queries against large volumes of data. The primary benefits of parallelism scale-up and speed-up are basic requirements in warehousing. The database server has to offer a comprehensive set of parallel operators that cover every operation involved in query execution, including table and index scans, sorts, joins, aggregation, and grouping, as well as the creation of summary tables, inserts, updates and deletes. The parallel query technology needs to support all the parallel hardware architectures: Symmetric Multi-Processors (SMP), Clusters, and Massively Parallel Processing (MPP) systems.

Most importantly, all of these query processing technologies have to be integrated. Contrary to what some vendors may claim, there is no one silver bullet for warehouse query processing. That is, meeting the complex, varied query needs of data warehousing is not just a matter of having a single specialized technique such as star query optimization or bitmap indexes. It's not even achieved by implementing a comprehensive set of techniques; that is just the pre-requisite. The real challenge lies in the seamless integration of all the capabilities in query optimization, access and join methods, and query execution that enables them to work together and deliver the full power of the query processing engine.

Data Management
Data Management refers to all of the activities related to getting data into the warehouse, organizing it for efficient access and easy manageability, and maintaining it. In database terminology, it covers operations such as data loading, building indexes, collecting statistics on the data, enforcing constraints, reorganizing tables and indexes, building aggregates or summaries, and data purging. While these activities are not unique to data warehousing, the sheer volume of data places special requirements in terms of scalability and performance.

To effectively meet the needs of data warehousing, the database server has to provide a data management infrastructure that delivers:
  • Capacity to deal with large data volumes. Again, as in query processing, this need translates into a requirement for parallelism in the execution of data management operations.
  • Efficient operations within finite maintenance windows. This need is commonly achieved through fine-grained maintenance operations.
Page 3 of 6
| Previous | Next |

Applications | Technology | Policy | History | News | Tenders | Events | Interviews | Career | Companies | Country Pages | Books | Publications | Education | Glossary | Tutorials | Downloads | Site Map | Subscribe | GIS@development Magazine | Updates | Guest Book