GML – User Perspectives
The XML Translation Engine
The XML translation engine facilitates the reading of arbitrary GML user application schema datasets as it can interpret arbitrary XML elements with an appropriate xfMap. The xfMap can either be written manually or generated automatically through the examination of the GML user application schema documents. The GML 2.0 Recommendation provides rules and guidelines for users to define their own application schemas with the W3C XML Schema. These rules and guidelines make it possible for a software module to perform type discovery on a GML user application schema. A user may know if an XML element represents a feature, a feature’s properties, or a feature’s geometric properties by following the type hierarchy for that element in the GML user application schema. When an author creates a GML user application schema, each of its new feature types, feature collection types, geometry types, and geometry property types must ultimately derive from some of the base types provided by GML. By walking through the type hierarchy, application software can discover what each element in the GML document is supposed to mean.
One can envision a software module that is capable of reading any arbitrary GML user application schema dataset by examining the type hierarchy of a schema to automatically generate an xfMap that maps GML features into the XML translation data engine geographic features.
The GML user application constrains the interpretation of the XML elements from a GML document into a well-defined geographic context. This well-defined GML geographic context can be transformed with an xfMap by the XML translation engine into the engine’s neutral geographic features.
The GML Translation Engine
Implementing The XML And GML Translation Engine
The XML translation engine leverages the free parsing utilities that are available in the industry. It uses both common XML parsing APIs, namely the DOM and SAX APIs. The DOM API is use to read in the xfMap mapping-rules, while the SAX API is used to read the actual input XML data document in a streaming manner. The consequence of this is that the XML translation engine can handle arbitrarily large XML documents.
The GML translation engine allows the processing of GML datasets that are based on arbitrary user application schemas. We have implemented an API that provides access to a W3C XML Schema document because most of the free cross-platform XML parsers available do not provide such an API. This API allowed us to perform type discovery on the W3C XML Schema document to discover what each of the elements in a GML document represents. Type discovery allows the GML translation engine to automatically generate an xfMap document that is used in conjunction with the XML translation engine behind the scenes.
The XML translation engine is coupled to a general-purpose data translation hub; this hub can take the format-neutral geographic features output by the XML translation engine and translate them into a large array of GIS formats. The hub is capable of performing a large variety of sophisticated transformations on geographic features. These include topology, geometry, attribute, and coordinate system transformations. The hub also provides an API for all of the transformations that the XML translation engine exposes through the xfMap’s “group mapping-rules”. Briefly, the group mapping-rules allow further processing of the topology, attributes, and geometry for the geographic features constructed by the feature mapping-rules. For example, the feature-mapping-rules may be used to map the XML elements into geographic primitives, while the group-mapping rules may specify geometric operations to construct the topology of the dataset from the geographic primitives.
GML Lessons Learned
We have used the above technology in a number of XML/GML projects. During this we have seen some projects go much more smoothly than others. This has given us insight into GML’s strengths and weaknesses when applied to real-world problems. GML is no different than any other technology and has strengths and weaknesses. Some of these may seem obvious but are stated nevertheless.
Firstly, because GML is a means of disseminating data, it is an interchange format not a database. The purpose of an interchange format is to move data into a database or other system where the data is to be processed and used. GML is not a format that lends itself to doing any sort of GIS analysis.
For instance, Ordnance Survey uses GML to distribute nationwide data (OS MasterMap). GML enables them to model and distribute their data in a standards-based vendor neutral manner and allows users to get updates frequently. The Ordnance Survey is a prime example of how GML can be used effectively.
Secondly, GML files are an order of magnitude slower to process than binary files. Therefore, parsing GML is expensive. Often users will complain that GML files are big, heavy, and slow to read. While this is true, the purpose of GML is to provide a standards-based interchange format. As GML is based on XML, GML will have all the benefits of XML, such as a standard set of tools available for developing an application. On the other hand, GML will also have the disadvantages of XML. Specifically, GML and XML are not designed to be an efficient means of storing data.
Finally, to use GML, users must have access to external documents that describe the structure and the meaning of the data. A big difference exists between being able to read data and use data intelligently. Expecting an application to be written that is going to read and restructure data intelligently without any human intervention is unrealistic.
GML is a very rich and flexible language for expressing data. Unlike the historical highly restrictive formats that the GIS industry is used to, including SHAPE and MIF, GML enables users to represent data in an almost infinite number of ways. However, this richness is no free lunch. For an application to ingest or use data in a schema it has never seen before, it most effectively requires a human user to define the mapping to the schema of the destination system. Fortunately, work is being done now to define industry application schemas. Work is also being done on “profiles” within the OGC. A profile is a set of rules that constrain how GML can be used to ideally increase data sharing by lowering the bar of effort required to read data.