Is Mainstream Database Technology Ready For AM/FM/GIS?
Since this topic has been discussed in detail elsewhere, it will not be examined any
further here. However, the important thing to note is that the ability of a system to handle
long or short transactions is essentially independent of other database issues, such as
spatial data handling capabilities. A strong argument can be made that long transaction
handling is at least as important as spatial data handling for a DBMS which is to be used
for AM/FM applications which handle planning and design processes.
This is an area which so far does not seem to have been addressed at all by the major
RDBMS vendors, and in most cases they do not have stated plans to do so. Most of the
00DBMS vendors have addressed the long”transaction problem at least to some extent,
although they do not in general have the full sophistication of some of the approaches
which have been developed by AM/FM vendors. However, they do at least recognize
that the problem exists.
Processing architecture
Another significant aspect of a DBMS for GIS, which is largely independent of the issues
discussed in the previous two sections, is the architecture which it uses for processing
queries and returning data. In a client-server architecture, the distribution of work can be
balanced towards either the client or the server machine.
Mainstream RDBMSS have a very server-oriented processing architecture. Client
machines do very little processing, just issuing a simple query request such as an SQL
statement, and all the significant processing is done on the database server. This is a
good architecture for handling relatively simple queries which return a small amount of
data.
One of the most common and compute-intensive operations in an AM/FM/GIS is a screen
redraw. A suitable DBMS for AM/FM/GIS should be able to support a redraw as an
interactive query against the database - this greatly simplifies the system architecture and
application development. However, the query characteristics of a GIS redraw are very
different from the relatively simple queries for which mainstream DBMSS are designed.
The requirement for a screen redraw is to retrieve hundreds or thousands of records,
possibly from many different tables, based on a complex spatial predicate, within a few
seconds.
Running frequent queries of this type on a server with many users in a networked
environment causes two major performance bottlenecks. The first is processing on the
server, since all the processing for every complex query from every user takes place on
the same server machine. The second is network traffic, since the large amount of data
returned from each query has to travel across the network every time. This leads to very
serious limitations in scaling server-oriented DBMSS to many concurrent networked users
for AM/FM/GIS applications.
There are alternative query architectures in which most of the processing is done on the
client rather than the server, typically making use of cached data on the client. Several of
the OODBMSS support this sort of architecture. It is actually much simpler to implement
a robust client-oriented query architecture in a long transaction environment, because
changes are not immediately propagated to other users, which makes data caching much
simpler. For more details on a client-oriented relational database architecture based on
version management see Newell and Batty, 1994. This approach also has a natural
extension to distributing data over a wide area network (see Newell, 1993).
Standard data access
A separate issue from being able to store and manipulate spatial data (via the GIS) is the
ability to access it in a standard way from other applications. The main existing data
8.access standard is SQL. If the DBMS supports standard access methods then it is easier
to integrate the GIS with other systems, and existing application development skills and
tools can be exploited. The existing SQL standard (known as SQL 92, or SQL2) does not
cover spatial data or other complex datatypes.
Work is currently in progress on SQL3, the next major version of the SQL standard. This
will include support for more complex data types and operators, and a subset of the
standard known as SQL/MM (Multimedia) will define specific spatial datatypes and
operators. A draft version of the SQL3 standard exists, but it will be several years before
it is finalized.
Another standards effort is the OpenGIS initiative of the Open GIS Consortium (formerly
known as OGIS). This is based on distributed object models such as COM/OLE and
CORBA. This is being actively supported by all the major AM/FM/GIS vendors and it
seems likely at the time of writing (November 1996) that the first levels of the standard
will be published in 1997. There is a good chance that the use of these sort of object-oriented
interface standards will supersede the SQL approach. For up to date information
on OpenGIS, see the OGC web page at www.opengis.org.
Heterogeneous DBMS Support
Historically, AM/FM/GIS systems have been closely tied to working against a specific
database or file format. If you liked the client functionality of GIS brand X, but preferred
the database architecture of GIS brand Y, you had to choose one or the other, you ould
not have both. A strong trend in the industry at the moment is for client AM/FM/GIS
applications to be able to work against multiple different spatial databases or file formats.
This trend will be accelerated when the OpenGIS standards have been agreed upon.
This will provide the possibility of using different DBMSS for different tasks. For design
and planning work a DBMS which handles long transactions well is a requirement. For
an application like tracking vehicle locations, all users need to see updates immediately
so a mainstream short transaction DBMS would be appropriate. For applications with
very high performance real time requirements such as SCADA or NMS, an 00DBMS
may be appropriate.
There may be cases where multiple DBMSS contain different representations of the same
data, for example there may beat least some aspects of data about a utility network in
both a long transaction environment which is used for planning and design, and in a real
time environment which is used for network management. In this case there maybe a
requirement for robust data replication to maintain synchronization between these
different representations.
Conclusions
The recent developments in spatial data handling from the major DBMS vendors area
very positive step forward for the AM/FM/GIS industry. Of the four key areas discussed
in this paper relating to AM/FM/GIS database technology, these developments have
focused primarily on the area of spatial data modeling. Solutions are also in sight in
9.another area, that of standard data access, as the Open GIS Consortium promises to
deliver the first stages of the OpenGIS standard in 1997.
However, there is less progress on the other two areas by the major RDBMS vendors.
These are long transaction handling, which is essential for handling design and planning
work processes, and client-oriented processing, which is key to providing scaleable
AM/FM/GIS performance with many networked users. Several of the OODBMS vendors
have addressed these issues to some extent, as have some of the GIS vendors. Without
these capabilities, the mainstream RDBMSS cannot directly replace the current state of
the art of AM/FM/GIS database technology without significant additional development
on top of the basic RDBMS.
It is actually possible to put a layer of software on top of a mainstream RDBMS which
provides full long transaction handling capabilities and client-oriented query processing.
In this scenario though the RDBMS is not contributing much, just acting as a fairly
unintelligent data repository, and the software on top is equivalent in scope to a full
DBMS. However, this is a solution for organizations who feel that they really want all
their data stored in a mainstream RDBMS. Until the mainstream DBMS vendors address
long transaction handling there will be a market for “transaction manager” products
which sit above a mainstream DBMS.
Another option is to use multiple DBMSS, as discussed in the previous section. Although
the mainstream RDBMS vendors argue that you should use one DBMS to solve all data
processing problems, and there are valid arguments in favor of this, there are also strong
arguments that some data processing problems are sufficiently different, and difficult, that
it makes sense to use a different DBMS which is designed specifically to solve that
problem. As long as all the DBMSS used conform to standard interfaces, such as ODBC
for alphanumeric data, and OpenGIS standards for spatial data, the fact that different
DBMSS are being used can be transparent to application developers and users.
Technologies like distributed object models and heterogeneous replication make this sort
of multi-DBMS approach increasingly practical.
There is something of a parallel here between the general move which the computer
industry has seen from a centralized mainframe environment to a networked PC and
workstation environment. The latter approach is harder to administer, but this drawback
is outweighed by its significant advantages in terms of performance and flexibility. It will
be interesting to see whether we see a similar trend in the database industry over the next
few years.
References
- Batty, P.M., 1991, Why use a standard RDBMS for GIS?: GIS World International GIS
Sourcebook.
- Batty, P. M., 1995, AM/FM Data Modeling for Utilities: Proceedings of AM/FM
Conference XVIII, pp. 709-725.
- Newell, R. G., and Easterfield, M.E., 1990: Version Management - the problem of the
long transaction. Proceedings of the Mapping Awareness conference.
- Newell, R. G., 1993, Distributed database versus fast communications: Proceedings of
AM/FM Conference XVI, Orlando, 1993.
- Newell, R. G., and Batty, P.M., 1994, GIS databases ~ different: Proceedings of
AM/FM Conference XVII, Denver, pp. 279-288. (Smallworld technical paper 19).
- Seaborn, D., 1992, “1995: The Year GIS Disappeared”: Proceedings of AM/FM
Conference XV, San Antonio, 1992.