GISdevelopment.net ---> GITA 2003 ---> Innovative Technologies

An XML- Drivern Data Translation Engine for GML 2.0

Dale Lutz
Safe Software Suite 2017-7445 132nd Street Surrey, BC Canada V3W 1J8


Abstract
GML 2.0 allows users to develop their own custom GML 2.0 application schemas. This paper presents an XML-driven data translation engine that allows datasets based on any arbitrary GML 2.0 user application schema to be translated to and from diverse GIS formats.

The XML-driven translation engine is configured using XML and is able to exploit the capabilities of Safe Software’s general-purpose data translation components. It allows new GML 2.0-based support to be added by anyone, enabling new GML 2.0 user application schemas to be supported in time that can be measured in hours rather than weeks. Once the GML 2.0 user application is supported, its datasets can be moved to any of the GIS formats. Additional benefits for allowing the XML-driven translation to exploit the capabilities of the generalpurpose data translation components are that organizations can embed GML 2.0 reading and writing capability into their own software applications, as well as immediately view the GML 2.0 user application datasets spatially.

This presentation also discusses two projects that used GML 2.0 with this engine to solve different problems: one describes how GML 2.0 is used as a data dissemination platform, and the second describes GML 2.0 as the common language between disparate systems.

Introduction
Historically, the task of moving geographic data from one format to another has been difficult. As a result, users with large data stores have been locked into a single vendor’s format and have been restricted to using one vendor’s analysis and decision support tools. The Geography Markup Language (GML) attempts to alleviate these difficulties by increasing organizations’ ability to share geographic information. GML, which is based on the eXtensible Markup Language (XML), is an open and non-proprietary specification used for the transport and storage of geographic information. This paper presents a configurable XML translation engine that enables translation between XML-based formats and other GIS formats. It then describes how the XML translation engine can be automatically configured to read datasets that are based on any arbitrary GML user application schema.

The XML specification provides a standard way for defining markup languages for textual documents; it is a meta-language that allows users to design and format the structural relationships of their documents using strict lexical and syntactical constraints. XML documents are stored in plain text, which introduces numerous beneficial consequences. Because XML is human-readable, a plain text editor can be used to view documents; it is also easily transmitted across platforms and over the Internet. In addition, plain text is vendor-neutral, so information that is stored in XML is not locked into a proprietary binary format. XML thus enables disparate systems to share information easily – and, since GML is defined with XML, it inherits all of XML’s benefits.

GML uses the W3C XML Schema Definition Language to define and constrain the contents of its XML documents. The GML v2.0 Specification defines some basic conformance requirements for users to develop their own application schemas. Software applications attempting to process any arbitrary GML user application schema must understand GML and all of the technologies upon which GML depends, including the W3C XML Schema. Many free parsers are available for XML. These parsers may be used by GIS applications as a base building block for implementing GML software modules. Most XML parsers provide the option for validating an XML document through a W3C Schema document. GIS applications can utilize these parsers to read and validate documents for arbitrary GML user application schemas. It must be noted that software applications must still interpret the output from the XML parsers into their own local meaningful context. The software application must know what each XML element in the GML dataset means, whether the element refers to a feature, a property of a feature, or a feature collection. It is not enough for the GIS application software to use the XML parser to validate the dataset according to a schema: the application must also understand how GML uses the W3C Schema to define a geographic feature and its properties. GML introduces an extraordinary flexibility by letting users define their own application schemas suitable for their own domains; however, this same flexibility also presents a substantial difficulty for writing GML software applications.

It is more or less trivial to write a software component that works on a particular GML user application schema. The GML software component can even bypass the schema processing, since all the processing logic for that particular domain can be hard-coded into the software component. The W3C XML Schema document only needs to be examined if, in addition to processing, the user wishes to perform validation on the GML datasets with the XML parser.

It is substantially more difficult to write a software component that works on any arbitrary GML user application schema because the component would be expected to understand any GML dataset. Reading GML documents into a system is trivial since users can employ any of the free XML parsers available. The difficulty comes in interpreting the XML elements into a geographic context and then in interpreting that geographic context into a GIS system’s own local context. GML helps software components interpret XML data by constraining the interpretation of the data into a well-defined geographic context.

Since a GML user application schema is expressed through a W3C XML Schema document, a software component can perform type discovery on the schema to identify which XML elements from the GML dataset represent a feature, a feature’s properties, and a feature’s geometric properties. Currently most of the XML parsers available support validation with the W3C XML Schema on a document but do not expose the W3C XML Schema programmatically through an API. This presents an extra obstacle for GML software application programmers who need type discovery in order to work with their GML datasets, because admittedly, the W3C XML Schema Recommendation is in no way an easy specification to understand.

Another difficulty when working with geographical data in XML – whether the GML software component handles only one or multiple types of GML user application schemas – is that geographical datasets are inherently large in size. There are two standard APIs that are used by software applications to parse XML documents: DOM and SAX. The DOM specification defines a tree-based approach to navigating an XML document. Currently, a DOM Parser creates an in-memory tree-based data structure, which makes it prohibitive to use on GIS data. A SAX Parser is event-based – it does not itself construct an internal representation of the XML document, but instead provides callback functions for software applications to handle events. Using a SAX Parser does not tax the memory usage, but it does require considerable more effort to program against. The application programmer should avoid using the in-memory representation of the parsed XML document when large geographic datasets are being read. The next section describes an XML translation engine that enables translation between XML-based formats and other GIS formats. The XML translation engine can be dynamically configured to map XML elements into the engine’s local notion of geographical objects. These objects are neutral representations of geographic features: they are not tied to any particular GIS format, can hold geographic and non-geographic information, and can be exported through a general-purpose translation hub into diverse GIS systems. The XML translation engine is configured through mapping rules that are themselves expressed in XML. This substantially reduces the amount of effort needed for a user to support new XML-based formats, since users can write new mapping rules in a matter of hours, rather than taking weeks to code software components using traditional programming.

In designing the XML data translation engine and its declarative mapping rules, a stream-based processing model was chosen, since the potential size of geographical data made the tree-based processing model unfeasible.

The XML Data Translation Engine
The XML data translation engine is event-driven. It takes two input XML documents: the XML data document and an xfMap document, which contains its declarative mapping-rules. The xfMap mapping-rules are loaded into the translation engine and react to the data document’s input stream of XML elements. These mapping-rules may be activated, executed, suspended and deactivated by the translation engine during key events. These key events will be illustrated later in an example. Although there are different types of xfMap mapping-rules, we’ll only consider mapping-rules that construct neutral geographical features from the XML elements. These are called “feature mapping-rules”.

The XML data translation engine constructs one feature at a time. The first feature mapping-rule that is activated initiates the construction of a new geographical feature. Subsequent feature mapping-rules that are activated do not create a new geographical feature; they help on the construction of the feature that was already created. The geographical feature is completely constructed only after the deactivation of the initial feature mapping-rule. The following XML document fragment illustrates the usage of the xfMap feature mapping-rules:



If we want a geographic feature to be constructed on the element, then a feature mapping-rule must activate when XML translation engine reads the element’s start tag. The following feature mapping-rule will activate when the element’s start tag is read:

Fragment2



The above feature mapping-rule deactivates when the element’s end tag is read. The geographic feature that is constructed is vacuous: it has no feature-type, attributes nor geometry. The following is a textual representation of the vacuous feature (it is an actual log of the geographic feature from the data translation hub):

+++++++++++++++++++++++++++++++++++++++++++++++++++++
Feature Type: `'
Attribute(string): `xml_type' has value `xml_no_geom'
Geometry Type: Unknown (0)
=====================================================

The feature-type and attributes of a geographic feature may be constructed by adding a



The xfMap’s element allows extraction of information from the input data document. Because the input data document is read in a streaming manner, the element can only locate and extract information from a sub-tree of the data document whose root element start tag caused the activation of the mapping-rule. In the case above, the two elements can only extract information from Fragment1’s and child elements. The xfMap’s element sets the feature-type for the geographic feature. In Fragment3, the element under the element pulls in the content of Fragment1’s element and sets this content as the feature-type of the geographic feature. The feature-type for the geographic feature is set to “City”. The attributes of the geographic feature are set by the xfMap’s element. An element can contain one or more . Each has a and a . The feature mapping-rule in Fragment3 specifies one attribute for the geographic feature. The name of this attribute, “featureCode”, is set by the element; it is the string value specified by its “expr” attribute. The value of the “featureCode” attribute is set to be the content of Fragment1’s element, “1234”. The textual representation of the geographic feature constructed by the feature mapping-rule in Fragment3 is:

+++++++++++++++++++++++++++++++++++++++++++++++++++++
Feature Type: `City'
Attribute(string): `featureCode' has value `1234'
Attribute(string): `xml_type' has value `xml_no_geom'
Geometry Type: Unknown (0)
=====================================================

The geographic feature still lacks geometry. The geometry for a feature can be constructed by adding an xfMap’s element to a feature mapping-rule:



The following is a textual representation of the geographic feature constructed by Fragment4’s feature mapping-rule on Fragment1’s element. Notice that a twodimensional point geometry feature with coordinates (10,0) is constructed:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Feature Type: `City'
Attribute(string): `featureCode' has value `1234'
Attribute(string): `fme_geometry' has value `fme_point'
Attribute(string): `xml_type' has value `xml_point'
Geometry Type: Point (1)
Number of Coordinates: 1
Coordinate Dimension: 2
Coordinate System: `'
(10,0)
====================================================================

In Fragment4, the element in the feature mapping-rule directs the XML translation engine to construct a point-geometry for the geographic feature by using the “xml-point” geometry builder. The XML translation engine contains several predefined geometry builders that are format neutral. These geometry builders are capable of constructing point, line, area, and aggregate geometries. In addition, the XML translation engine can be easily extended with new format-specific geometry builders as need arises. Every geometry builder receives the information that it needs from the xfMap’s element’s elements. The “xml-point” geometry builder requires a element whose “name” attribute must be “data-string”, and its value must be the coordinate string sequence to parse. The “xml-point” geometry builder also accepts other optional elements, which can be used to specify the coordinate string sequence’s dimension, the character(s) that separate each coordinate, the character(s) that separate each axis of the coordinate, the order of the axis of the coordinates (for example, x, y, z, or y, x, z, etc…), and the decimal character for each coordinate (for example, “.”, or “,”). The names and values for the elements are geometry builder-dependent. The and elements are called “expression elements”. Several of the elements in the above feature mapping-rules had expression elements as their children; these included the xfMap’s , , and elements. An xfMap element that accepts an expression element actually accepts a sequence of them. The element in Fragment4 has a sequence of expression elements that consists of an , a and an element; the “xml-point” geometry builder will receive the string “10.0,0.0” as the value of its “data-string” data parameter. There are several expression elements that were not illustrated in the above mapping-rule fragments: , , and .

The XML translation engine uses the xfMap’s feature mapping-rules to map the XML elements into format-neutral geographic features. This allows for flexibility since the XML translation engine is not hard-coded for any particular XML format. The engine allows the processing for different type of XML based documents into geographical features without the need to write a new software module in a traditional programming language. Any GML dataset based on an arbitrary GML user application schema can be processed once an appropriate xfMap is written.

The XML Translation Engine


The XML translation engine facilitates the reading of arbitrary GML user application schema datasets as it can interpret arbitrary XML elements with an appropriate xfMap. The xfMap can either be written manually or generated automatically through the examination of the GML user application schema documents. The GML 2.0 Recommendation xfMap provides rules and guidelines for users to define their own application schemas with the W3C XML Schema. These rules and guidelines make it possible for a software module to perform type discovery on a GML user application schema. A user may know if an XML element represents a feature, a feature’s properties, or a feature’s geometric properties by following the type hierarchy for that element in the GML user application schema. When an author creates a GML user application schema, each of its new feature types, feature collection types, geometry types, and geometry property types must ultimately derive from some of the base types provided by GML. By walking through the type hierarchy, application software can discover what each element in the GML document is supposed to mean. One can envision a software module that is capable of reading any arbitrary GML user application schema dataset by examining the type hierarchy of a schema to automatically generate an xfMap that maps GML features into the XML translation data engine geographic features. The GML user application constrains the interpretation of the XML elements from a GML document into a well-defined geographic context. This well-defined GML geographic context can be transformed with an xfMap by the XML translation engine into the engine’s neutral geographic features.

The GML Translation Engine


Implementing the XML and GML translation engine

The XML translation engine leverages the free parsing utilities that are available in the industry. It uses both common XML parsing APIs, namely the DOM and SAX APIs. The DOM API is use to read in the xfMap mapping-rules, while the SAX API is used to read the actual input XML data document in a streaming manner. The consequence of this is that the XML translation engine can handle arbitrarily large XML documents. The GML translation engine allows the processing of GML datasets that are based on arbitrary user application schemas. We’ve implemented an API that provides access to a W3C XML Schema document because most of the free cross-platform XML parsers available do not provide such an API. This API allowed us to perform type discovery on the W3C XML Schema document to discover what each of the elements in a GML document represents. Type discovery allows the GML translation engine to automatically generate an xfMap document that is used in conjunction with the XML translation engine behind the scenes.

The XML translation engine is coupled to a general-purpose data translation hub; this hub can take the format-neutral geographic features output by the XML translation engine and translate them into a large array of GIS formats. The hub is capable of performing a large variety of sophisticated transformations on geographic features. These include topology, geometry, attribute, and coordinate system transformations. The hub also provides an API for all of the transformations that the XML translation engine exposes through the xfMap’s “group mapping-rules”. Briefly, the group mapping-rules allow further processing of the topology, attributes, and geometry for the geographic features constructed by the feature mapping-rules. For example, the feature-mapping-rules may be used to map the XML elements into geographic primitives, while the group-mapping rules may specify geometric operations to construct the topology of the dataset from the geographic primitives.

The data translation hub to which the XML translation engine is coupled is Safe Software’s Feature Manipulation Engine (FME). The XML translation engine utilizes two of Safe Software’s APIs: the Plug-in Builder API and the FME Objects API. The Plug-in Builder API is used to enable XML to be translated into any of the FME-supported formats. The FME Objects API is used extensively in the implementation of the group-objects that group mappingrules create. The FME Objects API allows the group-objects to access all of the FME feature factories and feature functions. The GML translation engine also uses the FME Objects API to embed the XML translation engine to read GML documents. Additional benefits of using the FME APIs are that the FME Universal Viewer can readily view the XML-based formats; and other software applications can also embed the functionality of the XML and GML translation engines into their systems.

Conclusion
The XML translation engine replaces complex programming code with xfMap mapping-rules for the interpretation of XML elements. The XML translation engine can be used to read datasets from any arbitrary GML user application schema. This process can be automated when a type discovery module is used to analyze the schema for the automatic generation of xfMap documents. The XML translation engine enables new XML-based formats to be supported in time that can be measured in hours rather than in weeks. The GML translation engine enables GML datasets to be supported instantaneously.

References Bray, T., Paoli, J., Maler, E., and Sperberg-McQueen, C.M., 2000, Extensible Markup Language (XML) 1.0, Second Edition, W3C Recommendation, http://www.w3.org/TR/REC-xml.

Cox, S., Cuthbert, A., Lake, R., and Martell, R., Geography Markup Language (GML) 2.0: Open GIS Consortium Inc., http://www.opengis.net/gml/01-029/GML2.html.

Safe Software Inc., XML Reader: xfMap, Feature Manipulation Engine (FME) Readers and Writers, http://ftp.spatialdirect.com/fme/2002sr1/docs/ReadersWriters2002SR Thompson, H.S., Beech, D., Maloney, M., and Mendelsohn, N., 2001, XML Schema Part 1: Structures, W3C Recommendation, http://www.w3.org/TR/xmlschema-1/.

© GISdevelopment.net. All rights reserved.