Challenges of Spatial Databases
Conventional relational databases often do not have the technology required to handle spatial data. Unlike the traditional applications of databases, spatial applications require the databases to understand more complex data types like points, lines, and polygons. Also the operations on these types are complex when compared to the operations on simple types. Hence we need new technology to handle spatial data. Egenhofer (1993) has identified four main properties of the spatial data, which sets them apart from the traditional relational data.
Geometry
Geometry is a main property in any kind of spatial data. Geometry deals with the mathematical properties of an object. These properties include measurement (metric), relationships of points, lines, angles, surfaces, and solids (topology) and order. A simple geometry is usually constructed from the geometric primitive such as points, lines and areas. Complex geometries are constructed from collections of simple geometries. In addition, there are a number of geometric relationships between two geometries, which are very important for dealing with spatial data. For example, a connectivity relation describes how two geometries are connected (on a road map, how one intersection is connected to another intersection). Metric relationships deal with the distances between two geometries. For example, what are all the cities located within 10 miles of a given road? Geometry is usually represented using a vector data model (where each geometry consists of a set of points) or a raster data model (where each geometry is an image).
Distribution of Objects in Space
Spatial objects are usually very irregularly distributed in space. Consider the case where we model all the cities in the United States as spatial objects (points). Then the distribution of cities on the east coast is very dense where as the distribution of cities in the Arizona, Nevada areas is very sparse. In addition, different spatial objects have largely varying extents. For example, if we look at the road network model which models roads with lines and cities with polygons, we see that there are some very large objects in model (large road like I95) and small objects (like a small city Nashua, NH).
Data Volume
Several GIS applications deal with very large databases of the order of terabytes. For example, remote sensing applications gather terabytes of data from satellites every day. Similarly data warehousing applications and NASA's Earth Observation System are other examples of systems with terabytes of spatial data.
Requirements of a Spatial Database System
Any database system that attempts to deal with spatial applications has to provide the following features:
- A set of spatial data types to represent the primitive spatial data types (point, line, area), complex spatial data types (polygons with holes) and operations on these data types like intersection, distance, etc.
- The spatial types and operations on top of them should be part of the standard query language that is used to access and manipulate non-spatial data in the system. For example, in case of relational database systems, SQL should be extended to support spatial types and operations.
- The system should also provide
performance enhancements like indexes to process spatial queries
(range and join queries), parallel processing, etc., which are
available for non-spatial data.
A Solution: Object Relational Databases
Object-relational database management systems are an attempt to incorporate object-oriented capabilities to a database environment. The new constructs added to the core functionality of traditional relational databases include abstract data types, object identity, and the ability to create operations or procedures through the database programming interface to work on these objects. An example of a project that proposed object-relational (or extended-relational) systems is POSTGRES. Commercial products include the Universal Servers from Oracle, Informix and IBM. More interestingly, the ANSI standardization committee for the database data language has proposed several extensions to the SQL3 (Gardels (1997), OGC (1998)) standard that incorporate object-oriented features into the SQL language. Any Spatial database system should address the following five main areas to support spatial applications: (i) Classification of Space (ii) Data Model, (iii) Query Language, (iv) Query Processing, and (v) Data organization and Indexing.
Classification of Space
For modeling different objects in space, the basic elements are point, line, and area. A point represents an object, which only has its location in space (X,Y or X,Y,Z) as the spatial attribute. Point can be used to model a city or a building in a large-scale map. A line represents an object, which has location attributes along with an extent. A line can be used to model roads, rivers, or utility lines. A region (or a polygon) has location attributes along with extent and an area. Here a region can be a polygon with holes as long as there is only one contiguous area associated with it. Regions can be used to model county boundaries, state boundaries, etc.
Data Model
In traditional database applications the data types of the attributes are limited. These data types consist of integers, floats, character strings and dates. Object relational databases provide a higher level of abstraction for spatial data by incorporating concepts closer to human's perception of space. This is accomplished by incorporating the object-oriented concept of user-defined abstract data types (ADTs). An ADT is a user-defined atomic type and its associated functions. For example if we have land parcels stored in a database then an ADT would be a combination of the "atomic type" polygon and some associated function, say, adjacent, which may be applied to land parcels to determine if they are adjacent.