XML & GIS Application Integration
David G Cottrell
Geographic Information Technology, Inc. 101 Inverness Drive East
Suite 130 Englewood, CO 80112
What is XML?
Markup languages were originally created as a means of conveying metadata. These
metadata .tags. were used to describe the data that they delimit. The first such language
accepted as a standard was Standard Generalized Markup Language (SGML). The
International Standards Organization (ISO) accepted SGML as ISO 8879 in 1986.
SGML was intended to be very flexible so that it could be customized to describe data for
any application. This also caused SGML to be very complex and not widely accepted.
Hypertext Markup Language (HTML) was created as a means of describing the visual
representation of data. Anyone who has ever used a Web browser has seen an example
of the visual representation that HTML tags create. For example, the following markup
tags.
<b>Hypertext Markup Language<b>
create the following visual representation within a browser.
ypertext Markup Language
Further, HTML is designed specifically for use within the context of the World Wide Web.
Design of Extensible Markup Language (XML) began in 1996 by the World Wide Web
Consortium (W3C;
http://www.w3.org). XML was designed to have the .flexibility of
SGML and the widespread acceptance of HTML. (Martin, D., et. al., 2000). XML is a
self-describing markup language that describes hierarchical trees of data. Tags that serve
as descriptors for the data delimit each data element within the tree. Each tag can also
contain more metadata in the form of attributes (see Figure 1). The structure of the XML
document is defined within a Document Type Definition or DTD. A DTD is a separate
document following Extended Bachaus-Naur Form syntax (see Figure 2).

Figure 1
The following is an example of the DTD for the XML in Figure 1.

Figure 2
The <!ELEMENT> tag defines the elements that are allowed within the XML document.
They also define the hierarchical nature of the document. In the above example first
.element. tag defines the name of the element and states that one or more child elements
called .vehicle. are allowed. The second .element. definition defines the .vehicle. tag
and states that it is .EMPTY. meaning there is no data associated with it. There are
attributes associated with it which are defined by the <!ATTLIST> tag. The first attribute
is enumerated and can contain any of the listed values with a default value of 4 if the
attribute is not included in the <vehicle> tag. The attribute .color. is required in every
instance of the <vehicle> tag. One has to be careful when defining attributes in this way.
Even though the .color. attribute is required we have not constrained the text that can
actually appear in the value. The value .Charlie.s Angels. would be considered valid by
a validating parser because the only requirement is that it be some text.
A valid XML document must conform to the DTD it is using where a well-formed
document simply complies with basic XML syntax standards (see
http://www.w3.org/XML).
Why XML?
Enterprise Application Integration or EAI has become a very popular acronym of late and
can mean many different things depending upon the needs of the enterprise. Regardless
of the architecture chosen, it is the nature of EAI to integrate disparate systems. In order
for integration to be scalable and cost effective, the messages sent between systems must
be as decoupled from the applications as possible while still maintaining context. XML
provides the means for accomplishing this.
XML is well suited for the transfer of data between heterogeneous applications and
platforms. Many applications that are now being integrated with GIS are legacy
applications. These pieces of software often reside on mainframe systems or other
platforms that are different from the one that supports the GIS. This problem diminishes
when using XML as a data transfer vehicle. XML is based upon the Universal Character
Set (UCS), or Unicode, and documents are standard ASCII text files that support many
different text-encoding protocols. This characteristic of XML makes it platform
independent (Martin, D. et. al, 2000, pp. 55-57).
XML also lends itself well to standardization. There are many Document Type
Definitions, which define industry specific .vocabularies.. The following are examples
of popular DTDs. This list is not intended to be exhaustive but should provide a good
starting point for standardization and development of an enterprise vocabulary.
One of the most interesting initiatives proposed by Microsoft is the BizTalk Framework
which is .a set of guidelines for how to publish schemas in XML and how to use XML
messages to easily integrate software programs together in order to build rich new
solutions. (
http://www.biztalk.org). This initiative has support from companies like
SAP, CommerceOne, Boeing, and BP/Amoco. Through the BizTalk Web site .you can
locate, manage, learn about, share information about and publish XML, XSL and
information models and business processes supported by applications that support the
BizTalk Framework. (
http://www.biztalk.org). Essentially, the initiative is attempting to
drive the design and adoption of new XML technologies.
Another attractive element of XML is that developers have access to many free tools for
parsing and creating XML documents (see
http://www.xml.org). These tools are
available for many different languages including Java, C++, Perl and Visual Basic.