Talking GIS : A Theoretical Basis
1. Background
People often use qualitative spatial thinking and reasoning in everyday life (Kuipers 1978; Lynch 1960); however, current commercial GISs support primarily quantitative spatial queries and answers, and in general lack direct support for users to make qualitative queries about spatial relations. For example, point to point distance in metric units or direction in degrees does not necessarily conform to people's usage of terms and concepts in natural-language. People's spatial concepts are often more qualitative in nature, including such concepts in natural-language. People's spatial concepts are often more qualitative in nature, including such concepts as north, near and far (Hong 1994; Frank 1992). A qualitative GIS model that is capable of answering queries of the form, "Show me the road that goes around the Acadia National Park" will complement the existing models. The incorporation of natural-language spatial relations such as along, goes by, and through will give GIS users a greater choice in formulating queries, depending on the task being performed. By better accommodating the human requirements, qualitative models will also contribute toward the greater utilization of GIS technology. Among the potential applications of such a model are road navigation system, tourist databases, and evidence gathered from a security or judicial perspective.
In creating natural-language, spatial-query models, a cross-linguistic study will be of value as it can be used to gauge the versatility of the mathematical formalism utilized, as well as to celibrate the parameters of the mathematical formalism for different cultural and linguistic groups. Although GIS command languages are mostly written in English with the cognitive modals of native English speakers, the GIS users' community is mostly composed of groups of non-native English speakers (Campari 1994). As GIS database queries are often an expression of spatial relationships, it seems that "use errors will be reduced if the names chosen for such spatial predicated match the common sense meanings of those terms" (Mark and Egenhofer 1994). As such, calibrating the spatial predicates of a particular natural language against a formal mathematical model of spatial relations would allow us to hand a tool to GIS develops and enable them to design GIS databases that will be capable of answering spatial queries unambiguously in the particular language concerned.
2. Problem Statement
Formal models currently used in commercial GISs lack the parameters to accommodates the flexibility that natural language has in partitioning space. Thus, it is necessary to identify what parameters play a significant role in the selection of spatial predicates when people describe spatial relations, and what parameters do not influence the meaning of a particular term. Knowledge about these parameters will allow us to develop formal models as well as allow for their calibration to better fit human intuition.
3. Motivation
The primary motivation for this work stems from the fact that there is a need for a mathematical formation that is flexible enough to accommodate natural-language spatial queries that people may ask from a GIS. The cross-linguistic dimension of this problem is particularly challenging, as such a mathematical formalism could be used in the transfer of data sets across cultures, yet allowing for he correct determination of the semantics of the spatial relations.
Within the framework of a Global Information Infrastructure (GII), the results from this research can be used in building spatial data transfer standards at a local level. For example, a Spatial Data Transfer Standard for Malaysia requires an efficient link not only among databases within Malaysia, but also to the rest of the GII as it will allow users.
- to exchange explicitly stored relations to save time and computer resources as users will receive a set of explicit spatial relationships and need not derive them again from the dataset.
- to carry out consistency evaluation tests; for example, consistency tests can be carried out after the data transfer process to ensure that the interrelationships between the various individual spatial objects have been correctly transferred ; and
- to consistently query different system based on a set of standardized spatial relations.
4. Goal and Hypothesis
Geographic database typically comprise digital, geometric representations of real-world entities. These entities may be represented as objects that consist of points, lines, and areas, whose relationships are described geometrically, for example by using concepts such as a point being the intersection of two lines or two points being located on opposite sides of a line. Spatial queries in a GIS, however, contain symbolic representations of spatial relations rather than detailed descriptions of the geometry. Examples of such symbolic representations are term like through or along. People more frequently describe a "road that goes along the park" rather than a "road that coincides with the park's boundary".
For creating formalisms of natural-language spatial relations there is a need to map the symbolic representations of real-world spatial relations onto a valid geometric configuration. Such a mapping would likely have a one-to-many relation between the symbols and geometry, because natural language has a limited set of words to match the infinite number of geometric configuration that can be constructed.
The goal of this research is to develop a former model that captures the essence of geometric configurations. The essential geometry that is captured should satisfy a particular natural-language query about a spatial relation and also have the capability of describing a particular geometric configuration within a database using the best natural-language description. In the first case, all geometric configurations within a database that fit the query description should qualify as the answers to the query. In the latter case, a user may be interested in finding the best natural-language description for a particular geometric configuration in a database, and the system should be capable of responding with a prototypical description of the particular geometric configuration.
Several other models for natural-language spatial relations have been defined based on linguistic, geometrical, and connectionist approaches; however, none addresses sufficiently the linkage to data models used in geographic information systems. The models used by linguists are primarily based upon introspection (Herskiovits 1986; Talmy 1983). Although they provide good research directions, that lack of mathematical objectivity in them precludes their use in information systems. Current GISs are based directly on these continuous measurers, then there would be an infinite number of terms, as continuous measures are subject to infinite variability. The direct usage of Cartesian coordinates as a basis for natural-language spatial descriptors is thus inappropriate. However, by discretizing the continuous measures, for example by determining the range of values for particular spatial terms, a mapping between the discretized values and spatial terms, and vice-versa, can be made. Such an approach will be beneficial to a GIS, because it can be easily implemented using current query tools such as SQL.
The connectionist model, which in principle attempts parallel computing using the human brain as a model, is another method that has been used to represent natural-language spatial relations (Regier 1995). This approach involves the training of the model using a very comprehensive set of geometrical configurations in order to achieve the desired level of results. Other formal models that have been proposed include the 9-intersection model (Egenhofer and Herring 1991), which is a purely topological model; models for cardinal directions (Frank 1992; Peuquet and Ci-Xiang 1987); and models for approximate distances (Hernandez et al. 1995, Hong et al. 1995).
Although the exact geometry in a set of configurations showing a road and a park, symbolized by a dark line and a polygon, respectively, may not be identical, people may use the same term to describe several distinct configurations (Figure 1). For example, the three configurations in Figure 1 have both end points of the line outside the polygon and at least some part of the line's interior intersects with the interior of the polygon. In essence, they have the same topology, and the same term through is used to describe all three configurations.

Figure 1 : Similar term through
On the other hand, different terms may be used to describe configurations even though the topology of these configurations is the same. For example, Figure 2a could be described by the term goes along as in, "The road goes along the park," while Figure 2b may be better referred to by, "The road is outside the park." Hence, topology cannot be the only criterion for determining the appropriate spatial terms. Metric distinctions may also be necessary.

Figure 2. Different terms (a) along and (b) outside
The hypothesis for this research is
"For the same topology, metric influences the choice of natural-language spatial terms." The hypothesis is proved by:
- creating a formal framework based on the categorization of spatial relations as well as the identification of spatial parameters that can be tested;
- calibrating the values of these spatial parameters for specific terms; and
- analyzing the distribution of the material parameters for the spatial terms.
Using the 9-intersection model (Egenhofer and Herring 1994) as a foundation, a method for describing natural-language spatial relations is formulated. An elaborate definition of the parameters and the methodology for calibrating the values of the spatial parameters will be presented at the Conferece.