GISdevelopment.net ---> GITA 1997 ---> Advanced Technical Topics

Is Mainstream Database Technology Ready For AM/FM/GIS?


Peter M. Batty
Smallworld Systems, Inc., 5600 Greenwood Plaza Blvd.,Englewood, CO 80111


Abstract
Historically, commercial Database Management Systems (DBMSS) have had significant shortcomings for use in GIS. This paper presents a wide-ranging look at the current state of the database industry, how it is starting to address the needs of GIS, where it is still falling short of requirements, and what the future will look like.

The major relational database vendors have begun to announce spatial extensions to their products, as well as other more general enhancements of use to GIS. The object-oriented database vendors continue to make headway in the computer industry, and their products have some advantages for GIS. A distributed object model, such as CORBA or COM/OLE, is becoming widely regarded as the way that applications and data will be structured in the future. Work is progressing on various relevant standards, such as SQL3, SQL/MM and OGIS.

This presentation will attempt to make some sense out of all these developments, and explain how they are likely to impact the GIS industry. It will examine how GIS packages will be able to access a choice of multiple spatial data repositories rather than being tied to one specific database or file format, which has traditionally been the case.

Overview
This paper begins by summarizing why one would wish to store spatial data in a “mainstream” Database Management System (DBMS). (Different definitions of a mainstream DBMS will be discussed later, but in general this term is used in this paper to refer to widely used relational DBMS products such as Oracle, Sybase, Informix and DB2). The history of mainstream DBMSS in GIS, and the reasons why they have not had a major impact yet, are discussed briefly. Recent developments in mainstream database technology which might impact AM/FM/GIS are then reviewed.

In order to assess the likely impact of these developments, a number of criteria for evaluating the suitability of a DBMS for AM/FM are then discussed. These include data modeling capabilities (which includes the issue of “relational versus object-oriented”), processing architecture (client versus server orientation), transaction handling model, and standard data access. One of the conclusions of this section is that different DBMSS may be suitable for different applications, and this is one of the drivers for client AM/FM/GIS applications to be able to work against multiple heterogeneous data servers, which is the topic of discussion for the next section.

Finally, some conclusions are drawn from this discussion about the likely direction of AM/FM/GIS database technology over the next few years.

Why use a Mainstream DBMS for AM/FM/GIS?
The attractions of using mainstream DBMSS for GIS have been described in some detail by various people, including this author (see Batty, 1991, and Seaborn, 1992). In summary, the advantages of this approach are that the GIS vendor should be able to take advantage of function provided by the DBMS vendor and concentrate on developing GIS functionality, while the user can take advantage of existing skills and use common procedures for many database administration tasks, for both GIS and non-GIS data. In particular, functions such as security and backup and recovery are well proven in standard DBMSS and the GIS can take advantage of these. Integration between GIS and non-GIS data should be easier to achieve when using this approach. There is good functionality available for integration between multiple DBMSS from different vendors in a heterogeneous distributed environment.

So there are apparently strong advantages to using mainstream DBMS technology, and there have been a number of products since the 1980s which used this approach. However, this approach has not yet achieved widespread acceptance. None of the top five AM/FM vendors in 1995 (based on Daratech figures) used a mainstream DBMS as their primary data repository. Several of them have recently announced support for this sort of approach, but in all cases this is optional and not in widespread use yet. There currently seems to be strong support in the industry for the view that the mainstream DBMS vendors should now become the holders of spatial data for the latest generation of GIS products. However, there have been false dawns before. Doug Seaborn’s 1992 AM/FM paper was entitled “1995: the year GIS disappeared”, and his thesis was that by 1995 mainstream DBMSS would fully accommodate spatial data, and GIS would just be another aspect of mainstream computing. Clearly this has not happened yet. This paper discusses whether the latest round of mainstream DBMS technology will realize this vision, or whether further developments are still required.

What were the problems?
Given the apparent advantages of using a mainstream DBMS which were outlined in the previous section, clearly there must also have been significant drawbacks to this approach, otherwise all the major systems would be using it. This section just introduces these issues, and they are examined in more detail later. Perhaps the most obvious historically was performance. Certainly hardware and software advances have greatly improved the situation. However, the performance issue is quite complex, and there is a strong argument that the current centralized processing architecture of mainstream DBMSS is fundamentally unsuitable for certain important aspects of AM/FM/GIS. The biggest single drawback of standard DBMSS for AM/FM is their lack of capability in the area of handling long transactions and version management. Long transactions are fundamental to planning and design work processes in a utility, but they are fundamentally different from the short transactions which mainstream DBMSS are 5.designed to handle. Vendors who have implemented systems on top of mainstream DBMSS have had to devote significant effort to developing code to handle long transactions on top of the DBMS. One of the major benefits of using a DBMS should be that it does all the required transaction handling.

The integration of geographic data into external applications is also not necessarily simplified by storing it in a standard DBMS. Because spatial data types were not explicitly supported by most standard DBMSS, the GIS vendor typically has had to use complex record structures and retrieval algorithms for the geographic data, which meant that it could not be easily read or updated by external applications, using native DBMS access capabilities. The external application also needs to understand any special locking mechanisms which are used by the GIS to handle long transactions, since the standard DBMS locking does not handle this. The most obvious short term solution to this problem is for the GIS vendor to provide an API (Application Programming Interface) which can be used by external applications, but that approach can be used equally well whether or not a standard DBMS is used for the GIS data.

Recent Mainstream DBMS Developments
In the past year or so, there have been various mainstream database developments which provide better capabilities for GIS than were previously available, These include better handling of complex data types in general, and in some cases specific spatial extensions. Oracle, Inforrnix and Sybase, amongst others, have all announced these sort of extensions.

This paper will not discuss the relative merits of these different vendors, but there is enough commonality about all of these approaches to make some general observations. In addition, object-oriented DBMS technology continues to develop, and this offers some attractions for AM/FM/GIS. However, use of OODBMS is still generally confined to small niches of the overall DBMS market, and most observers would not yet regard these as “mainstream” DBMSS.

In order to evaluate the extent to which mainstream DBMSS are now addressing the problems which have prevented their widespread exploitation in AM/.FM/GIS, the following issues are examined in this section:
  • Data modeling
  • Transaction handling
  • Processing architecture
  • Standard data access
Not all of these are obvious issues, and it will be seen that some are being addressed quite well by the mainstream DBMSS, while others are not being addressed at all. These issues are largely independent of each other, but all are extremely important for AM/FM/GIS.

Data modeling
One of the most obvious obstacles to the use of mainstream DBMSS for AM/FM/GIS has been the ability to easily handle complex datatypes, and specifically spatial data types such as coordinates, lines and areas. This includes both the storage of this sort of data, and the ability to query and manipulate it efficiently. In the past this functionality has typically been implemented in mainstream RDBMSS by storing complex information in binary strings, often known as “blobs”. Application code could be written to interpret this information, and also to maintain spatial indexes, typically as some sort of linear quadtree, which would allow efficient spatial queries.

The major drawbacks of this approach are that it requires significant development effort on the part of the GIS vendor, and that typically the spatial data stored in the database can only be understood by the GIS application. The GIS vendor could provide some sort of API (application programming interface) to allow other applications to query and manipulate the spatial data, but this would still not allow the spatial data to be recognized by tools which just used native database queries.

This is the area in which most of the recent progress by RDBMS vendors has been focused. The major database vendors now offer either the ability to define new complex datatypes and operators, or specific spatial datatypes, or both. The object-oriented database vendors have always provided good functionality in this area. As yet, there is no standardization of the spatial data types and operators supported by the different DBMSS, but there is work going on in this area which is discussed in the section on standard data access.

Transaction management
The so-called long transaction problem has been widely discussed in the GIS literature (see Newell and Easterfield, 1990). The key issue in this area is that the units of work, or (long) transactions, carried out in an AM/FM/GIS system which is being used for design and planning work, are fundamentally different from the (short) transactions which most current database management systems (DBMSS) are designed to handle. Short transactions typically take a few seconds at most. The approach which is generally taken to ensure integrity in a multi-user environment is to lock all the related records involved in a short transaction, and prevent any other user from updating those records for the duration of the transaction. In contrast, a long transaction may last for hours, days or weeks, and locking data in the same way for this length of time is not practical. In fact, locking data at all is often not desirable or not practical during a long transaction. There are also related issues such as managing multiple different designs for a piece of work, a topic which is known as version management.

Since this topic has been discussed in detail elsewhere, it will not be examined any further here. However, the important thing to note is that the ability of a system to handle long or short transactions is essentially independent of other database issues, such as spatial data handling capabilities. A strong argument can be made that long transaction handling is at least as important as spatial data handling for a DBMS which is to be used for AM/FM applications which handle planning and design processes.

This is an area which so far does not seem to have been addressed at all by the major RDBMS vendors, and in most cases they do not have stated plans to do so. Most of the 00DBMS vendors have addressed the long”transaction problem at least to some extent, although they do not in general have the full sophistication of some of the approaches which have been developed by AM/FM vendors. However, they do at least recognize that the problem exists.

Processing architecture
Another significant aspect of a DBMS for GIS, which is largely independent of the issues discussed in the previous two sections, is the architecture which it uses for processing queries and returning data. In a client-server architecture, the distribution of work can be balanced towards either the client or the server machine.

Mainstream RDBMSS have a very server-oriented processing architecture. Client machines do very little processing, just issuing a simple query request such as an SQL statement, and all the significant processing is done on the database server. This is a good architecture for handling relatively simple queries which return a small amount of data.

One of the most common and compute-intensive operations in an AM/FM/GIS is a screen redraw. A suitable DBMS for AM/FM/GIS should be able to support a redraw as an interactive query against the database - this greatly simplifies the system architecture and application development. However, the query characteristics of a GIS redraw are very different from the relatively simple queries for which mainstream DBMSS are designed. The requirement for a screen redraw is to retrieve hundreds or thousands of records, possibly from many different tables, based on a complex spatial predicate, within a few seconds.

Running frequent queries of this type on a server with many users in a networked environment causes two major performance bottlenecks. The first is processing on the server, since all the processing for every complex query from every user takes place on the same server machine. The second is network traffic, since the large amount of data returned from each query has to travel across the network every time. This leads to very serious limitations in scaling server-oriented DBMSS to many concurrent networked users for AM/FM/GIS applications.

There are alternative query architectures in which most of the processing is done on the client rather than the server, typically making use of cached data on the client. Several of the OODBMSS support this sort of architecture. It is actually much simpler to implement a robust client-oriented query architecture in a long transaction environment, because changes are not immediately propagated to other users, which makes data caching much simpler. For more details on a client-oriented relational database architecture based on version management see Newell and Batty, 1994. This approach also has a natural extension to distributing data over a wide area network (see Newell, 1993).

Standard data access
A separate issue from being able to store and manipulate spatial data (via the GIS) is the ability to access it in a standard way from other applications. The main existing data 8.access standard is SQL. If the DBMS supports standard access methods then it is easier to integrate the GIS with other systems, and existing application development skills and tools can be exploited. The existing SQL standard (known as SQL 92, or SQL2) does not cover spatial data or other complex datatypes.

Work is currently in progress on SQL3, the next major version of the SQL standard. This will include support for more complex data types and operators, and a subset of the standard known as SQL/MM (Multimedia) will define specific spatial datatypes and operators. A draft version of the SQL3 standard exists, but it will be several years before it is finalized.

Another standards effort is the OpenGIS initiative of the Open GIS Consortium (formerly known as OGIS). This is based on distributed object models such as COM/OLE and CORBA. This is being actively supported by all the major AM/FM/GIS vendors and it seems likely at the time of writing (November 1996) that the first levels of the standard will be published in 1997. There is a good chance that the use of these sort of object-oriented interface standards will supersede the SQL approach. For up to date information on OpenGIS, see the OGC web page at www.opengis.org.

Heterogeneous DBMS Support
Historically, AM/FM/GIS systems have been closely tied to working against a specific database or file format. If you liked the client functionality of GIS brand X, but preferred the database architecture of GIS brand Y, you had to choose one or the other, you ould not have both. A strong trend in the industry at the moment is for client AM/FM/GIS applications to be able to work against multiple different spatial databases or file formats. This trend will be accelerated when the OpenGIS standards have been agreed upon. This will provide the possibility of using different DBMSS for different tasks. For design and planning work a DBMS which handles long transactions well is a requirement. For an application like tracking vehicle locations, all users need to see updates immediately so a mainstream short transaction DBMS would be appropriate. For applications with very high performance real time requirements such as SCADA or NMS, an 00DBMS may be appropriate.

There may be cases where multiple DBMSS contain different representations of the same data, for example there may beat least some aspects of data about a utility network in both a long transaction environment which is used for planning and design, and in a real time environment which is used for network management. In this case there maybe a requirement for robust data replication to maintain synchronization between these different representations.

Conclusions
The recent developments in spatial data handling from the major DBMS vendors area very positive step forward for the AM/FM/GIS industry. Of the four key areas discussed in this paper relating to AM/FM/GIS database technology, these developments have focused primarily on the area of spatial data modeling. Solutions are also in sight in 9.another area, that of standard data access, as the Open GIS Consortium promises to deliver the first stages of the OpenGIS standard in 1997.

However, there is less progress on the other two areas by the major RDBMS vendors. These are long transaction handling, which is essential for handling design and planning work processes, and client-oriented processing, which is key to providing scaleable AM/FM/GIS performance with many networked users. Several of the OODBMS vendors have addressed these issues to some extent, as have some of the GIS vendors. Without these capabilities, the mainstream RDBMSS cannot directly replace the current state of the art of AM/FM/GIS database technology without significant additional development on top of the basic RDBMS.

It is actually possible to put a layer of software on top of a mainstream RDBMS which provides full long transaction handling capabilities and client-oriented query processing. In this scenario though the RDBMS is not contributing much, just acting as a fairly unintelligent data repository, and the software on top is equivalent in scope to a full DBMS. However, this is a solution for organizations who feel that they really want all their data stored in a mainstream RDBMS. Until the mainstream DBMS vendors address long transaction handling there will be a market for “transaction manager” products which sit above a mainstream DBMS.

Another option is to use multiple DBMSS, as discussed in the previous section. Although the mainstream RDBMS vendors argue that you should use one DBMS to solve all data processing problems, and there are valid arguments in favor of this, there are also strong arguments that some data processing problems are sufficiently different, and difficult, that it makes sense to use a different DBMS which is designed specifically to solve that problem. As long as all the DBMSS used conform to standard interfaces, such as ODBC for alphanumeric data, and OpenGIS standards for spatial data, the fact that different DBMSS are being used can be transparent to application developers and users. Technologies like distributed object models and heterogeneous replication make this sort of multi-DBMS approach increasingly practical.

There is something of a parallel here between the general move which the computer industry has seen from a centralized mainframe environment to a networked PC and workstation environment. The latter approach is harder to administer, but this drawback is outweighed by its significant advantages in terms of performance and flexibility. It will be interesting to see whether we see a similar trend in the database industry over the next few years.

References
  • Batty, P.M., 1991, Why use a standard RDBMS for GIS?: GIS World International GIS Sourcebook.
  • Batty, P. M., 1995, AM/FM Data Modeling for Utilities: Proceedings of AM/FM Conference XVIII, pp. 709-725.
  • Newell, R. G., and Easterfield, M.E., 1990: Version Management - the problem of the long transaction. Proceedings of the Mapping Awareness conference.
  • Newell, R. G., 1993, Distributed database versus fast communications: Proceedings of AM/FM Conference XVI, Orlando, 1993.
  • Newell, R. G., and Batty, P.M., 1994, GIS databases ~ different: Proceedings of AM/FM Conference XVII, Denver, pp. 279-288. (Smallworld technical paper 19).
  • Seaborn, D., 1992, “1995: The Year GIS Disappeared”: Proceedings of AM/FM Conference XV, San Antonio, 1992.
© GISdevelopment.net. All rights reserved.