Is Mainstream Database Technology Ready For AM/FM/GIS?
Peter M. Batty
Smallworld Systems, Inc., 5600 Greenwood Plaza Blvd.,Englewood, CO 80111
Abstract
Historically, commercial Database Management Systems (DBMSS) have had significant
shortcomings for use in GIS. This paper presents a wide-ranging look at the current state
of the database industry, how it is starting to address the needs of GIS, where it is still
falling short of requirements, and what the future will look like.
The major relational database vendors have begun to announce spatial extensions to their
products, as well as other more general enhancements of use to GIS. The object-oriented
database vendors continue to make headway in the computer industry, and their products
have some advantages for GIS. A distributed object model, such as CORBA or
COM/OLE, is becoming widely regarded as the way that applications and data will be
structured in the future. Work is progressing on various relevant standards, such as
SQL3, SQL/MM and OGIS.
This presentation will attempt to make some sense out of all these developments, and
explain how they are likely to impact the GIS industry. It will examine how GIS
packages will be able to access a choice of multiple spatial data repositories rather than
being tied to one specific database or file format, which has traditionally been the case.
Overview
This paper begins by summarizing why one would wish to store spatial data in a
“mainstream” Database Management System (DBMS). (Different definitions of a
mainstream DBMS will be discussed later, but in general this term is used in this paper to
refer to widely used relational DBMS products such as Oracle, Sybase, Informix and
DB2). The history of mainstream DBMSS in GIS, and the reasons why they have not had
a major impact yet, are discussed briefly. Recent developments in mainstream database
technology which might impact AM/FM/GIS are then reviewed.
In order to assess the likely impact of these developments, a number of criteria for
evaluating the suitability of a DBMS for AM/FM are then discussed. These include data
modeling capabilities (which includes the issue of “relational versus object-oriented”),
processing architecture (client versus server orientation), transaction handling model, and
standard data access. One of the conclusions of this section is that different DBMSS may
be suitable for different applications, and this is one of the drivers for client AM/FM/GIS
applications to be able to work against multiple heterogeneous data servers, which is the
topic of discussion for the next section.
Finally, some conclusions are drawn from this discussion about the likely direction of
AM/FM/GIS database technology over the next few years.
Why use a Mainstream DBMS for AM/FM/GIS?
The attractions of using mainstream DBMSS for GIS have been described in some detail
by various people, including this author (see Batty, 1991, and Seaborn, 1992). In
summary, the advantages of this approach are that the GIS vendor should be able to take
advantage of function provided by the DBMS vendor and concentrate on developing GIS
functionality, while the user can take advantage of existing skills and use common
procedures for many database administration tasks, for both GIS and non-GIS data. In
particular, functions such as security and backup and recovery are well proven in standard
DBMSS and the GIS can take advantage of these. Integration between GIS and non-GIS
data should be easier to achieve when using this approach. There is good functionality
available for integration between multiple DBMSS from different vendors in a
heterogeneous distributed environment.
So there are apparently strong advantages to using mainstream DBMS technology, and
there have been a number of products since the 1980s which used this approach.
However, this approach has not yet achieved widespread acceptance. None of the top
five AM/FM vendors in 1995 (based on Daratech figures) used a mainstream DBMS as
their primary data repository. Several of them have recently announced support for this
sort of approach, but in all cases this is optional and not in widespread use yet. There
currently seems to be strong support in the industry for the view that the mainstream
DBMS vendors should now become the holders of spatial data for the latest generation of
GIS products. However, there have been false dawns before. Doug Seaborn’s 1992
AM/FM paper was entitled “1995: the year GIS disappeared”, and his thesis was that by
1995 mainstream DBMSS would fully accommodate spatial data, and GIS would just be
another aspect of mainstream computing. Clearly this has not happened yet. This paper
discusses whether the latest round of mainstream DBMS technology will realize this
vision, or whether further developments are still required.
What were the problems?
Given the apparent advantages of using a mainstream DBMS which were outlined in the
previous section, clearly there must also have been significant drawbacks to this
approach, otherwise all the major systems would be using it. This section just introduces
these issues, and they are examined in more detail later. Perhaps the most obvious
historically was performance. Certainly hardware and software advances have greatly
improved the situation. However, the performance issue is quite complex, and there is a
strong argument that the current centralized processing architecture of mainstream
DBMSS is fundamentally unsuitable for certain important aspects of AM/FM/GIS.
The biggest single drawback of standard DBMSS for AM/FM is their lack of capability in
the area of handling long transactions and version management. Long transactions are
fundamental to planning and design work processes in a utility, but they are
fundamentally different from the short transactions which mainstream DBMSS are
5.designed to handle. Vendors who have implemented systems on top of mainstream
DBMSS have had to devote significant effort to developing code to handle long
transactions on top of the DBMS. One of the major benefits of using a DBMS should be
that it does all the required transaction handling.
The integration of geographic data into external applications is also not necessarily
simplified by storing it in a standard DBMS. Because spatial data types were not
explicitly supported by most standard DBMSS, the GIS vendor typically has had to use
complex record structures and retrieval algorithms for the geographic data, which meant
that it could not be easily read or updated by external applications, using native DBMS
access capabilities. The external application also needs to understand any special locking
mechanisms which are used by the GIS to handle long transactions, since the standard
DBMS locking does not handle this. The most obvious short term solution to this
problem is for the GIS vendor to provide an API (Application Programming Interface)
which can be used by external applications, but that approach can be used equally well
whether or not a standard DBMS is used for the GIS data.
Recent Mainstream DBMS Developments
In the past year or so, there have been various mainstream database developments which
provide better capabilities for GIS than were previously available, These include better
handling of complex data types in general, and in some cases specific spatial extensions.
Oracle, Inforrnix and Sybase, amongst others, have all announced these sort of
extensions.
This paper will not discuss the relative merits of these different vendors, but there is
enough commonality about all of these approaches to make some general observations. In
addition, object-oriented DBMS technology continues to develop, and this offers some
attractions for AM/FM/GIS. However, use of OODBMS is still generally confined to
small niches of the overall DBMS market, and most observers would not yet regard these
as “mainstream” DBMSS.
In order to evaluate the extent to which mainstream DBMSS are now addressing the
problems which have prevented their widespread exploitation in AM/.FM/GIS, the
following issues are examined in this section:
- Data modeling
- Transaction handling
- Processing architecture
- Standard data access
Not all of these are obvious issues, and it will be seen that some are being addressed quite
well by the mainstream DBMSS, while others are not being addressed at all. These issues
are largely independent of each other, but all are extremely important for AM/FM/GIS.
Data modeling
One of the most obvious obstacles to the use of mainstream DBMSS for AM/FM/GIS has
been the ability to easily handle complex datatypes, and specifically spatial data types
such as coordinates, lines and areas. This includes both the storage of this sort of data,
and the ability to query and manipulate it efficiently. In the past this functionality has
typically been implemented in mainstream RDBMSS by storing complex information in
binary strings, often known as “blobs”. Application code could be written to interpret
this information, and also to maintain spatial indexes, typically as some sort of linear
quadtree, which would allow efficient spatial queries.
The major drawbacks of this approach are that it requires significant development effort
on the part of the GIS vendor, and that typically the spatial data stored in the database can
only be understood by the GIS application. The GIS vendor could provide some sort of
API (application programming interface) to allow other applications to query and
manipulate the spatial data, but this would still not allow the spatial data to be recognized
by tools which just used native database queries.
This is the area in which most of the recent progress by RDBMS vendors has been
focused. The major database vendors now offer either the ability to define new complex
datatypes and operators, or specific spatial datatypes, or both. The object-oriented
database vendors have always provided good functionality in this area. As yet, there is no
standardization of the spatial data types and operators supported by the different DBMSS,
but there is work going on in this area which is discussed in the section on standard data
access.
Transaction management
The so-called long transaction problem has been widely discussed in the GIS literature
(see Newell and Easterfield, 1990). The key issue in this area is that the units of work, or
(long) transactions, carried out in an AM/FM/GIS system which is being used for design
and planning work, are fundamentally different from the (short) transactions which most
current database management systems (DBMSS) are designed to handle. Short
transactions typically take a few seconds at most. The approach which is generally taken
to ensure integrity in a multi-user environment is to lock all the related records involved
in a short transaction, and prevent any other user from updating those records for the
duration of the transaction. In contrast, a long transaction may last for hours, days or
weeks, and locking data in the same way for this length of time is not practical. In fact,
locking data at all is often not desirable or not practical during a long transaction. There
are also related issues such as managing multiple different designs for a piece of work, a
topic which is known as version management.