GISdevelopment.net ---> GITA 2001 ---> System Architecture

XML & GIS Application Integration

David G Cottrell
Geographic Information Technology, Inc. 101 Inverness Drive East
Suite 130 Englewood, CO 80112


What is XML?
Markup languages were originally created as a means of conveying metadata. These metadata .tags. were used to describe the data that they delimit. The first such language accepted as a standard was Standard Generalized Markup Language (SGML). The International Standards Organization (ISO) accepted SGML as ISO 8879 in 1986. SGML was intended to be very flexible so that it could be customized to describe data for any application. This also caused SGML to be very complex and not widely accepted.

Hypertext Markup Language (HTML) was created as a means of describing the visual representation of data. Anyone who has ever used a Web browser has seen an example of the visual representation that HTML tags create. For example, the following markup tags.

<b>Hypertext Markup Language<b>

create the following visual representation within a browser.

ypertext Markup Language
Further, HTML is designed specifically for use within the context of the World Wide Web.

Design of Extensible Markup Language (XML) began in 1996 by the World Wide Web Consortium (W3C; http://www.w3.org). XML was designed to have the .flexibility of SGML and the widespread acceptance of HTML. (Martin, D., et. al., 2000). XML is a self-describing markup language that describes hierarchical trees of data. Tags that serve as descriptors for the data delimit each data element within the tree. Each tag can also contain more metadata in the form of attributes (see Figure 1). The structure of the XML document is defined within a Document Type Definition or DTD. A DTD is a separate document following Extended Bachaus-Naur Form syntax (see Figure 2).


Figure 1

The following is an example of the DTD for the XML in Figure 1.


Figure 2

The <!ELEMENT> tag defines the elements that are allowed within the XML document. They also define the hierarchical nature of the document. In the above example first .element. tag defines the name of the element and states that one or more child elements called .vehicle. are allowed. The second .element. definition defines the .vehicle. tag and states that it is .EMPTY. meaning there is no data associated with it. There are attributes associated with it which are defined by the <!ATTLIST> tag. The first attribute is enumerated and can contain any of the listed values with a default value of 4 if the attribute is not included in the <vehicle> tag. The attribute .color. is required in every instance of the <vehicle> tag. One has to be careful when defining attributes in this way. Even though the .color. attribute is required we have not constrained the text that can actually appear in the value. The value .Charlie.s Angels. would be considered valid by a validating parser because the only requirement is that it be some text. A valid XML document must conform to the DTD it is using where a well-formed document simply complies with basic XML syntax standards (see http://www.w3.org/XML).

Why XML?
Enterprise Application Integration or EAI has become a very popular acronym of late and can mean many different things depending upon the needs of the enterprise. Regardless of the architecture chosen, it is the nature of EAI to integrate disparate systems. In order for integration to be scalable and cost effective, the messages sent between systems must be as decoupled from the applications as possible while still maintaining context. XML provides the means for accomplishing this.

XML is well suited for the transfer of data between heterogeneous applications and platforms. Many applications that are now being integrated with GIS are legacy applications. These pieces of software often reside on mainframe systems or other platforms that are different from the one that supports the GIS. This problem diminishes when using XML as a data transfer vehicle. XML is based upon the Universal Character Set (UCS), or Unicode, and documents are standard ASCII text files that support many different text-encoding protocols. This characteristic of XML makes it platform independent (Martin, D. et. al, 2000, pp. 55-57).

XML also lends itself well to standardization. There are many Document Type Definitions, which define industry specific .vocabularies.. The following are examples of popular DTDs. This list is not intended to be exhaustive but should provide a good starting point for standardization and development of an enterprise vocabulary. One of the most interesting initiatives proposed by Microsoft is the BizTalk Framework which is .a set of guidelines for how to publish schemas in XML and how to use XML messages to easily integrate software programs together in order to build rich new solutions. (http://www.biztalk.org). This initiative has support from companies like SAP, CommerceOne, Boeing, and BP/Amoco. Through the BizTalk Web site .you can locate, manage, learn about, share information about and publish XML, XSL and information models and business processes supported by applications that support the BizTalk Framework. (http://www.biztalk.org). Essentially, the initiative is attempting to drive the design and adoption of new XML technologies.

Another attractive element of XML is that developers have access to many free tools for parsing and creating XML documents (see http://www.xml.org). These tools are available for many different languages including Java, C++, Perl and Visual Basic.

Methodology
The initial step to be undertaken in any application integration project is to model the flow and use of the information. Integration methodologies range from simple point-to- point scenarios to more progressive EAI software products that employ .Business Bus. architectures (See J.C. Lutz EAI Journal, March 2000, pp. 64-73 for more details on EAI architectures). Within any given architecture there is the actual passage of information from one application to another. XML provides a platform- and application-independent means of transferring the data.

Information Modeling Information modeling is an important part of EAI integration. An information model gives data meaning through precise definitions that can be easily communicated with users. Data must be defined in their static and dynamic states. The messages passed between applications must also be defined. Several methodologies for creating a model, used either alone or in combination, are outlined in Professional XML (Martin, D., 2000, pp. 115-117).
  • Workflow models . Focuses on the flow of information between units of work.
  • Data flow models . Like workflow only more focus on the information than the business process.
  • Object Models . Useful as design tool to model the different .players. in a system.
  • Use Cases. Models how certain tasks are accomplished within the GIS and external systems and in the workplace.
  • Object Interaction Diagrams . Analyzes the interactions and exchange of information between objects.
Data Representation
Once the information model is complete, the data that will be passed between applications may be represented in terms of XML. This can be accomplished by object modeling whereby the elements in the XML document represent objects. The elements or objects could also represent records in a database. In the following example the parent object is a work order containing GIS objects that have been placed within a highly simplified design.


Figure 3

In order to keep the electrical model generic, the GIS objects have been abstracted into device and conductor objects. Agreed-upon attribute values define the type of device or conductor that is being placed. The structure of the work order can be specified in a Document Type Definition (DTD) to facilitate validation by the XML parser, which will be discussed later. Within the DTD each element is defined along with the permissible child elements and allowable attributes for each element.

<!- Example DTD for Work Order ->

Figure 4

If an organization has standardized upon XML there may be existing DTD.s that satisfy the needs of a given application -- or at least provide a good starting point for DTD design. .During the design phase, a software developer can look at a DTD and know that as long as the application he [/she] builds will output documents that conform to that DTD, other applications can process those documents. (Dick, K., 2000). DTD design is an effort that is best not performed in a vacuum. The work can be very time intensive up front so collaboration within the enterprise is advised.

In the event that it is necessary to use multiple DTD.s, developers can create XML documents that implement .Name Spaces.. Name Space syntax provides a mechanism to reference multiple DTD documents within a single XML document. Special prefix tags can be placed on the element tags to specify the DTD that should be used to validate the data within the element. Name Spaces will not be covered in more detail here but more information can found in Professional XML (Martin, D., et. al., 2000, pp.237-292) and on the Web at http://www.w3.org/TR/2000/PR-DOM-Level-2-Core- 20000927/core.html .

DTD.s do have shortcomings. The syntax used to define the DTD is not XML syntax and requires additional learning. It also does not specify data type constraints for the data contained in the XML document. It may specify that text must be used within a certain tag but does not constrain the size of the string, for example. Another alternative to DTD documents that is designed to overcome the DTD shortcomings is XML Schemas.

XML Schemas are not yet an officially recognized part of W3C.s XML standard, but are promising .Candidates for Recommendation. because they offer more flexibility in XML design. Some parsers and organizations have already adopted preliminary implementations of XML Schemas. Schemas are created using XML syntax and also support the concept of inheritance and strict data typing. Schemas are covered in more detail in Professional XML (Martin, D., et. al., 2000, pp.237-292) and on the Web at http://www.w3.org/XML/Schema

Data Processing
There are a number of parsing tools available to the programmer written in a variety of languages (see http://www.xml.org). These tools are based upon one of two models for representing XML programmatically. The two models include the Simple API for XML (SAX) and the Document Object Model (DOM).

The SAX model is event based. As the XML document is parsed an event is raised each time a tag or attribute is encountered. This is a .one-pass. model that does not allow for much flexibility in document navigation but is very memory efficient. More information on the SAX model can be found at Professional XML (Martin, D., et. al., 2000, pp.185- 235).

The DOM is a very flexible model that loads the entire XML document into memory. For extremely large documents this can be a resource issue, but for the majority of applications this is preferable because of the flexibility of being able to access any section of the document at any point in the application. DOM parsing tools are also supported by numerous programming languages and is an object oriented model for handling XML documents. The objects generally represented by the model are elements, attributes, and nodes. The hierarchical tree of data in the XML document can be parsed recursively to retrieve data. Implementers can also access data by tag name or retrieve a list of nodes that are children of an element. This process can also be reversed to accomplish creation of XML documents. The real flexibility of the DOM is that documents can be treated as a collection of objects and navigated as such instead of having to use traditional text parsing methods, which can be problematic. Additionally, text parsing is usually reliant upon some sort of data dictionary that is inflexible. The DOM allows for run time discovery of data representation based upon the document structure and the DTD.

Data validation must occur at the message or XML documents level and at the application level. At the application level, the business rules for the application must be enforced. This is up to the application programmer. At the document level, the data must be validated and checked for well-formedness. This can be accomplished using a validating parser. Many parsers support validation using DTD.s and more are coming out all the time that support validation using Schemas.

Example
In this example an Outage Management System (OMS) within the GIS must communicate with a Customer Information System (CIS) and a call reception system (or Voice Response Unit (VRU)). The communication required is the receipt of trouble calls into the OMS and the closeout of trouble calls in the CIS when the outage is restored. All communication is handled through the Business Bus architecture. In this case we assume the Bus to operate on a publish and subscribe methodology whereby applications can publish messages and subscribe to receive only those they are interested in.


Figure 5

In this scenario the organization has decided that the process flow will be as follows:
  • The call is received by the IVR.
  • The IVR takes the call data and creates a message that is sent to the Business Bus. In this case the message will be in XML format.
  • The Business Bus has been configured such that the OMS/GIS application has subscribed to receive the message. The message is sent to the OMS/GIS application.
  • An adaptor has been developed that is bolted on to the GIS/OMS application. The adaptor is really an XML parser that is able to communicate with the Business Bus.
  • The parser can decipher and build XML documents, and make some high-level business process decisions. Based upon these decisions, specific low level processing within the GIS can be triggered.
  • When the call has been resolved, the adaptor application constructs a new XML document and places it on the Business Bus.
  • The CIS that has subscribed to receive these messages retrieves the XML document.
  • The necessary updates are made in the CIS to the customer information.
The actual XML parser application could be built as part of the OMS/GIS application or it could reside externally. How this is constructed is dependent upon several factors. First of all, what languages does the OMS/GIS application natively support? If it is a proprietary language unique to the OMS or GIS, the developer has the choice to either

create their own parser in that language or to create one with an existing toolkit and integrate it with the OMS/GIS application.

Since it is the intent to reduce programming effort, it is advised to use existing toolkits to create XML parsing applications. XML parser libraries have been written for many of the mainstream programming languages

Conclusions
"Since the advent of client/server technology, application integration has become far more complex. Repositories for data and metadata are now on disparate hardware platforms in many data formats. (Morganthal, J. P., 2000). In this environment the applications are asynchronous and decoupled such that .it is not enough to have a common syntax for messages., there must be a context and an understanding of the data being exchanged (Morganthal, J. P., 2000). Use of XML for application integration provides data context and understanding through the self-describing nature of the data. The elimination of tightly coupled applications and architectures increases the scalability of enterprise applications. The net effect is reduced long-term development cost. As stated by J. C. Lutz (March, 2000), .While point-to-point solutions may make sense in limited cases, the approach is dangerous and costly in the medium- and long-term for many integration solutions. Bottom-line costs to the business are the inability to rapidly change and extend operations. Such costs inexorably lead to lost business and missed opportunities."

References
  • Dick, Kevin (March, 2000). .The Grammar of XML., EAI Journal, March 2000, p. 36.
  • Lutz, Jeffery C., (March, 2000). .EAI Architecture Patterns., EAI Journal, March 2000, p. 64.
  • Martin, D., Birbeck, M., Kay, M., Loesgen, B., Pinnock, J., Livingstone, S., Stark, P., Williams, K., Anderson, R., Mohr, S., Baliles, D., Peat, B., & Ozu, N. (2000).
  • Professional XML, Wrox Press Ltd., Arden House, Acock.s Green, Birmingham B27 6BH, UK.
  • Morganthol, J. P., (March 2000). .XML and The New Integration Frontier., EAI Journal, pp. 26-30.
© GISdevelopment.net. All rights reserved.