An XML- Drivern Data Translation Engine for GML 2.0
Dale Lutz
Safe Software Suite 2017-7445 132nd Street Surrey, BC Canada V3W 1J8
Abstract
GML 2.0 allows users to develop their own custom GML 2.0 application
schemas. This paper presents an XML-driven data translation engine that allows datasets based
on any arbitrary GML 2.0 user application schema to be translated to and from diverse GIS
formats.
The XML-driven translation engine is configured using XML and is able to
exploit the capabilities of Safe Software’s general-purpose data translation components. It allows
new GML 2.0-based support to be added by anyone, enabling new GML 2.0 user application
schemas to be supported in time that can be measured in hours rather than weeks. Once the GML
2.0 user application is supported, its datasets can be moved to any of the GIS formats. Additional
benefits for allowing the XML-driven translation to exploit the capabilities of the generalpurpose
data translation components are that organizations can embed GML 2.0 reading and
writing capability into their own software applications, as well as immediately view the GML 2.0
user application datasets spatially.
This presentation also discusses two projects that used GML 2.0 with this engine
to solve different problems: one describes how GML 2.0 is used as a data dissemination
platform, and the second describes GML 2.0 as the common language between disparate
systems.
Introduction
Historically, the task of moving geographic data from one format to another has
been difficult. As a result, users with large data stores have been locked into a single vendor’s
format and have been restricted to using one vendor’s analysis and decision support tools. The
Geography Markup Language (GML) attempts to alleviate these difficulties by increasing
organizations’ ability to share geographic information. GML, which is based on the eXtensible
Markup Language (XML), is an open and non-proprietary specification used for the transport
and storage of geographic information. This paper presents a configurable XML translation
engine that enables translation between XML-based formats and other GIS formats. It then
describes how the XML translation engine can be automatically configured to read datasets that
are based on any arbitrary GML user application schema.
The XML specification provides a standard way for defining markup languages
for textual documents; it is a meta-language that allows users to design and format the structural
relationships of their documents using strict lexical and syntactical constraints. XML documents
are stored in plain text, which introduces numerous beneficial consequences. Because XML is
human-readable, a plain text editor can be used to view documents; it is also easily transmitted
across platforms and over the Internet. In addition, plain text is vendor-neutral, so information
that is stored in XML is not locked into a proprietary binary format. XML thus enables disparate
systems to share information easily – and, since GML is defined with XML, it inherits all of
XML’s benefits.
GML uses the W3C XML Schema Definition Language to define and constrain
the contents of its XML documents. The GML v2.0 Specification defines some basic
conformance requirements for users to develop their own application schemas. Software
applications attempting to process any arbitrary GML user application schema must understand
GML and all of the technologies upon which GML depends, including the W3C XML Schema.
Many free parsers are available for XML. These parsers may be used by GIS
applications as a base building block for implementing GML software modules. Most XML
parsers provide the option for validating an XML document through a W3C Schema document.
GIS applications can utilize these parsers to read and validate documents for arbitrary GML user
application schemas. It must be noted that software applications must still interpret the output
from the XML parsers into their own local meaningful context. The software application must
know what each XML element in the GML dataset means, whether the element refers to a
feature, a property of a feature, or a feature collection. It is not enough for the GIS application
software to use the XML parser to validate the dataset according to a schema: the application
must also understand how GML uses the W3C Schema to define a geographic feature and its
properties. GML introduces an extraordinary flexibility by letting users define their own
application schemas suitable for their own domains; however, this same flexibility also presents
a substantial difficulty for writing GML software applications.
It is more or less trivial to write a software component that works on a particular
GML user application schema. The GML software component can even bypass the schema
processing, since all the processing logic for that particular domain can be hard-coded into the
software component. The W3C XML Schema document only needs to be examined if, in
addition to processing, the user wishes to perform validation on the GML datasets with the XML
parser.
It is substantially more difficult to write a software component that works on any
arbitrary GML user application schema because the component would be expected to understand
any GML dataset. Reading GML documents into a system is trivial since users can employ any
of the free XML parsers available. The difficulty comes in interpreting the XML elements into a
geographic context and then in interpreting that geographic context into a GIS system’s own
local context. GML helps software components interpret XML data by constraining the
interpretation of the data into a well-defined geographic context.
Since a GML user application schema is expressed through a W3C XML Schema
document, a software component can perform type discovery on the schema to identify which
XML elements from the GML dataset represent a feature, a feature’s properties, and a feature’s
geometric properties. Currently most of the XML parsers available support validation with the
W3C XML Schema on a document but do not expose the W3C XML Schema programmatically
through an API. This presents an extra obstacle for GML software application programmers who
need type discovery in order to work with their GML datasets, because admittedly, the W3C
XML Schema Recommendation is in no way an easy specification to understand.
Another difficulty when working with geographical data in XML – whether the
GML software component handles only one or multiple types of GML user application schemas
– is that geographical datasets are inherently large in size. There are two standard APIs that are
used by software applications to parse XML documents: DOM and SAX. The DOM
specification defines a tree-based approach to navigating an XML document. Currently, a DOM
Parser creates an in-memory tree-based data structure, which makes it prohibitive to use on GIS
data. A SAX Parser is event-based – it does not itself construct an internal representation of the
XML document, but instead provides callback functions for software applications to handle
events. Using a SAX Parser does not tax the memory usage, but it does require considerable
more effort to program against. The application programmer should avoid using the in-memory
representation of the parsed XML document when large geographic datasets are being read.
The next section describes an XML translation engine that enables translation
between XML-based formats and other GIS formats. The XML translation engine can be
dynamically configured to map XML elements into the engine’s local notion of geographical
objects. These objects are neutral representations of geographic features: they are not tied to any
particular GIS format, can hold geographic and non-geographic information, and can be exported
through a general-purpose translation hub into diverse GIS systems. The XML translation engine
is configured through mapping rules that are themselves expressed in XML. This substantially
reduces the amount of effort needed for a user to support new XML-based formats, since users
can write new mapping rules in a matter of hours, rather than taking weeks to code software
components using traditional programming.
In designing the XML data translation engine and its declarative mapping rules, a
stream-based processing model was chosen, since the potential size of geographical data made
the tree-based processing model unfeasible.
The XML Data Translation Engine
The XML data translation engine is event-driven. It takes two input XML
documents: the XML data document and an xfMap document, which contains its declarative
mapping-rules. The xfMap mapping-rules are loaded into the translation engine and react to the
data document’s input stream of XML elements. These mapping-rules may be activated,
executed, suspended and deactivated by the translation engine during key events. These key
events will be illustrated later in an example. Although there are different types of xfMap
mapping-rules, we’ll only consider mapping-rules that construct neutral geographical features
from the XML elements. These are called “feature mapping-rules”.
The XML data translation engine constructs one feature at a time. The first feature
mapping-rule that is activated initiates the construction of a new geographical feature.
Subsequent feature mapping-rules that are activated do not create a new geographical feature;
they help on the construction of the feature that was already created. The geographical feature is
completely constructed only after the deactivation of the initial feature mapping-rule.
The following XML document fragment illustrates the usage of the xfMap feature
mapping-rules:
If we want a geographic feature to be constructed on the
element, then
a feature mapping-rule must activate when XML translation engine reads the
element’s start tag. The following feature mapping-rule will activate when the
element’s start tag is read:
Fragment2