GML – User Perspectives

Mr. Don Murray,
President, Safe Software
Suite 2017-7445 132nd Street
Surrey, BC Canada V3W 1J8
Tel: (604) 501-9985
Fax: (604) 501-9965
E-mail: dmurray@safe.com
GML – User Perspectives
Historically, the task of moving geographic data from one format to another has been difficult. As a result, users with large data stores have been locked into a single vendor’s format and have been restricted to using one vendor’s analysis and decision support tools. The Geography Markup Language (GML) attempts to alleviate these difficulties by increasing organizations’ ability to share geographic information. GML, which is based on the eXtensible Markup Language (XML), is an open and non-proprietary specification used for the transport and storage of geographic information. This paper presents a configurable XML translation engine that enables translation between XML-based formats and other GIS formats. It then describes how the XML translation engine can be automatically configured to read datasets that are based on any arbitrary GML user application schema.
The XML specification provides a standard way for defining markup languages for textual documents; it is a meta-language that allows users to design and format the structural relationships of their documents using strict lexical and syntactical constraints. XML documents are stored in plain text, which introduces numerous beneficial consequences. Because XML is human-readable, a plain text editor can be used to view documents; it is also easily transmitted across platforms and over the Internet. In addition, plain text is vendor-neutral, so information that is stored in XML is not locked into a proprietary binary format. XML thus enables disparate systems to share information easily – and, since GML is defined with XML, it inherits all of XML’s benefits.
GML uses the W3C XML Schema Definition Language to define and constrain the contents of its XML documents. The GML v2.0 Specification defines some basic conformance requirements for users to develop their own application schemas. Software applications attempting to process any arbitrary GML user application schema must understand GML and all of the technologies upon which GML depends, including the W3C XML Schema.
Many free parsers are available for XML. These parsers may be used by GIS applications as a base building block for implementing GML software modules. Most XML parsers provide the option for validating an XML document through a W3C Schema document. GIS applications can utilize these parsers to read and validate documents for arbitrary GML user application schemas. It must be noted that software applications must still interpret the output from the XML parsers into their own local meaningful context. The software application must know what each XML element in the GML dataset means, whether the element refers to a feature, a property of a feature, or a feature collection. It is not enough for the GIS application software to use the XML parser to validate the dataset according to a schema: the application must also understand how GML uses the W3C Schema to define a geographic feature and its properties. GML introduces an extraordinary flexibility by letting users define their own application schemas suitable for their own domains; however, this same flexibility also presents a substantial difficulty for writing GML software applications.
It is more or less trivial to write a software component that works on a particular GML user application schema. The GML software component can even bypass the schema processing, since all the processing logic for that particular domain can be hard-coded into the software component. The W3C XML Schema document only needs to be examined if, in addition to processing, the user wishes to perform validation on the GML datasets with the XML parser.
It is substantially more difficult to write a software component that works on any arbitrary GML user application schema because the component would be expected to understand any GML dataset. Reading GML documents into a system is trivial since users can employ any of the free XML parsers available. The difficulty comes in interpreting the XML elements into a geographic context and then in interpreting that geographic context into a GIS system’s own local context. GML helps software components interpret XML data by constraining the interpretation of the data into a well-defined geographic context.
Since a GML user application schema is expressed through a W3C XML Schema document, a software component can perform type discovery on the schema to identify which XML elements from the GML dataset represent a feature, a feature’s properties, and a feature’s geometric properties. Currently most of the XML parsers available support validation with the W3C XML Schema on a document but do not expose the W3C XML Schema programmatically through an API. This presents an extra obstacle for GML software application programmers who need type discovery in order to work with their GML datasets, because admittedly, the W3C XML Schema Recommendation is in no way an easy specification to understand.
Another difficulty when working with geographical data in XML – whether the GML software component handles only one or multiple types of GML user application schemas – is that geographical datasets are inherently large in size. There are two standard APIs that are used by software applications to parse XML documents: DOM and SAX. The DOM specification defines a tree-based approach to navigating an XML document. Currently, a DOM Parser creates an in-memory tree-based data structure, which makes it prohibitive to use on GIS data. A SAX Parser is event-based – it does not itself construct an internal representation of the XML document, but instead provides callback functions for software applications to handle events. Using a SAX Parser does not tax the memory usage, but it does require considerable more effort to program against. The application programmer should avoid using the in-memory representation of the parsed XML document when large geographic datasets are being read.
The next section describes an XML translation engine that enables translation between XML-based formats and other GIS formats. The XML translation engine can be dynamically configured to map XML elements into the engine’s local notion of geographical objects. These objects are neutral representations of geographic features: they are not tied to any particular GIS format, can hold geographic and non-geographic information, and can be exported through a general-purpose translation hub into diverse GIS systems. The XML translation engine is configured through mapping rules that are themselves expressed in XML. This substantially reduces the amount of effort needed for a user to support new XML-based formats, since users can write new mapping rules in a matter of hours, rather than taking weeks to code software components using traditional programming.
In designing the XML data translation engine and its declarative mapping rules, a stream-based processing model was chosen, since the potential size of geographical data made the tree-based processing model unfeasible.
The XML Data Translation Engine
The XML data translation engine is event-driven. It takes two input XML documents: the XML data document and an xfMap document, which contains its declarative mapping-rules. The xfMap mapping-rules are loaded into the translation engine and react to the data document’s input stream of XML elements. These mapping-rules may be activated, executed, suspended and deactivated by the translation engine during key events. These key events will be illustrated later in an example. Although there are different types of xfMap mapping-rules, we’ll only consider mapping-rules that construct neutral geographical features from the XML elements. These are called “feature mapping-rules”.
The XML data translation engine constructs one feature at a time. The first feature mapping-rule that is activated initiates the construction of a new geographical feature. Subsequent feature mapping-rules that are activated do not create a new geographical feature; they help on the construction of the feature that was already created. The geographical feature is completely constructed only after the deactivation of the initial feature mapping-rule.
The following XML document fragment illustrates the usage of the xfMap feature mapping-rules: