GISdevelopment.net ---> GITA 2001 ---> System Architecture

Using middleware for GIS integration and factors for evaluating technologies

David K. Eason, Consultant
Geographic Information Technology, Inc.
101 Inverness Drive East, Suite 130
Englewood, CO 80112 Voice: (303) 708-9355 x142
Fax: (303) 708-9356
email:deason@geoit.com, eason@acm.org

Gary L. Powell, Senior Consultant
Geographic Information Technology, Inc.
5185 West Del Rio Street, Chandler, AZ 85226-1944
Voice/Fax: (480) 940-4434
email:gpowell@geoit.com


Abstract
This paper defines "middleware," reviews a few examples of some major middleware technologies, and describes factors for comparing those technologies. Middleware enables integration of components with each other, such as adding GIS capabilities to other enterprise information systems such as Outage Management, Customer Information, and Enterprise Resource Planning. Middleware provides the glue that transparently connects these systems, reducing development costs and enabling components to change without affecting other components. Examples of middleware technologies include DCOM, CORBA, JAVA, RPC/DCE, MQS, and XML. Each of these technologies has different capabilities and considerations for use. Such factors can be used to evaluate most other middleware technologies.

Introduction
  1. Motivation for Middleware

  2. Geographic Information Systems have evolved from "islands of automation" to systems that need to be integrated with a wide variety of other enterprise information systems. These include work management, outage management, customer information, mobile dispatch, and enterprise resource planning systems. A significant portion of the system development part of most GIS implementations now involves establishing interfaces with these other systems. A functional architecture diagram (Figure 1) illustrates this situation.


    Figure 1. Example GIS Functional Architecture Diagram

    With few systems, developers would build custom one-to-one interfaces. With many systems, this approach becomes untenable, since each application must be altered with every change made to the overall workflow or business structure. Using a middlewarebased approach simplifies integrating large numbers of components; changes are generally required only to the middleware itself, leaving the applications untouched. Middleware thus offers a more effective and efficient approach to enterprise application integration (EAI).

    Middleware has often been selected as an integration tool for adding functionality to legacy systems. For example, a company may decide that their mainframe customer information system needs to be connected with a GIS for supporting new customer service capabilities. Middleware helps this process. Middleware is also the method of choice for developing very large systems from scratch. It allows asynchronous evolutionary development of the different components; the system can begin useful operation without waiting for a single component to implement a non-essential feature.

  3. Definition of Middleware

  4. The purpose of middleware is to simplify the process of connecting applications or components, including those found on different platforms across networks. Middleware has been characterized in many ways: as the glue that holds components together, as an insulator that protects business logic developers from complex network protocols, and as a facilitator for reducing conflicts between platforms. It is a layer of software that can support multiple communication protocols, multiple programming languages, and multiple computer platforms.

    In the most general sense, middleware encompasses all the technologies that support interactions between components. In contrast to single, monolithic applications that attempt to do almost everything, middleware-connected systems comprise many components that offer discrete capabilities, often running on different computers with different operating systems. Choosing and constructing appropriate middleware solutions eases component integration and enhances scalability and maintainability. Integration, scalability, and maintainability are three important aspects to successful information system implementations.

  5. Non-Technical Challenges

  6. We will focus here on technical characteristics of middleware solutions. Many challenges that face middleware developers, however, are not technical. Because legacy systems are often part of the equation, learning how to interface with the legacy systems can be a significant challenge. Politics often play a part also, as different business units within a company perceive interfaces with other components as a risk to their system's stability, data integrity, or security. A solution that may be perfectly sound technically may be politically untenable. Component interfaces may change over time, giving middleware developers the sense that their architectures are built on sand. These types of nontechnical issues must be given careful consideration to achieve success in all phases of a system's life cycle.

  7. Selected Middleware and Paper Structure

  8. Middleware encompasses many different concepts. It can be a programming language that provides developers with a means to build remote interfaces. It can be a method of specifying the nature of the data being shared between components. Finally, it can be a shrink-wrapped product that forces component independence by completely decoupling the interfaces between components; using an unchangeable, third-party interface scheme and appropriate application programming interfaces (APIs) for using that scheme. More than one middleware technology is often used concurrently in order to meet complex system goals.

    Since it is not possible to review all available middleware technologies in this paper, only a few representative products and methods will be discussed. Each will be discussed in terms of performance, platforms, development ease, reliability, scalability, reusability, synchronous vs. asynchronous, and open vs. proprietary. These concepts are extensible to all other middleware technologies.

    Many factors contribute to a company's choice of middleware technologies. Some factors, such as choice of platform, may be predetermined. Others, such as development ease and cost, depend on in-house talent and budgetary constraints. Some of these factors are discrete ("to deploy a Unix computer or not?") while others imply a continuum ("how reliable is this product?"). The non-discrete factors are evaluated below according to the authors' experience and independent trade journal research. Accompanying each factor is a discussion of how the different middleware technologies fill that space. These details will help managers and developers decide how to proceed with their middleware implementations.
Overview of some middleware technologies>
Some common characteristics of all middleware technologies provide a context for understanding those technologies. Since communication is the heart of middleware, communication methods are discussed. Then we review the way that communication is implemented - whether by a programming language or other means. Finally, we introduce one possible form of each communication's contents: XML.
  1. Communication Methods

  2. Components can communicate through (at least) one of four means: sockets, remote procedure calls, remote method invocation, and messages. Sockets have been used for decades, and are the lowest level means of connecting one computer to another (Comer & Stevens, 1996). A client and server must encode and decode data through a socket stream, so a great deal of programming effort goes into protocol specification. Using sockets often requires a large amount of programming effort, and requires dealing explicitly with such complexities as multithreading, deadlock, synchronization, and network problems. Middleware technology will be easier to implement to the degree that these details are hidden.

    Remote procedure calls (RPC) refer to the ability for function A to call upon function B as if B was local, even if it's not. An ideal RPC product would completely hide all socket-level details - effectively "wrapping" the RPC calls in functions or methods. Remote method invocations (RMI) are analogous to RPCs, but apply to objects (java.sun.com, 1999). An object, which is an instance of a class, has both data and methods (which are functions that access the data).

    Message exchange is the last major communication method used by middleware technologies. While RPCs and RMIs focus on calling remote functions or objects, message exchange only concerns data transfer between components. Message oriented middleware (MOM) works much like electronic mail, using store-and-forward queuing provided by a shrink-wrapped product that runs separately from the components themselves. The receiving component "knows" what to do with a message once it's received, and the sending component "knows" how to package that data so the receiver understands it. Two MOM products include IBM's MQS (Message Queuing Series) and MSMQ (Microsoft Message Queuing). The theory behind both products is nearly identical, but MQS works on many more platforms than MSMQ (which is for Windows computers only [Lewis, 2000]), so we only discuss MQS. Programming Languages and Interface Standards

    Various programming languages, and middleware technologies that wrap languages, make good choices for a middleware solution. A few dozen languages advertise themselves as middleware enablers, and Java is the most dominant. Distributed object systems such as Microsoft's DCOM (Distributed Component Object Model [Sessions, 1998]) and OMG's CORBA (Object Management Group's Common Object Request Broker [Siegel, 2000; corba.org, 2000]) provide standard interfaces for applications to register, discover, and use components*. CORBA stubs "wrap" programs written in other languages with a pseudo-language called IDL (interface definition language), while DCOM provides components with a Microsoft-specific interface definition that subsequently allows other consumer programs to use COM objects. Both DCOM and CORBA protect developers from socket details but don't inherently offer error recovery.

  3. Extensible Markup Language (XML)

  4. The Standard General Markup Language (SGML) and its derivatives - such as the eXtensible Markup Language (XML) and Hypertext Markup Language (HTML) - provide a means for different computers to understand, parse, or format text streams based on standardized "tags" that are embedded in the text. XML is particularly notable for its flexibility: developers can create their own tags and rules for well-formedness and validity (Walsh, 1998). XML is discussed below because of its extensibility and usefulness for providing self-describing data to components regardless of the other middleware technologies being used.
Middleware factors to consider
The communication means, implementation, and context summarized above are implemented using many different middleware technologies. These technologies vary with respect to several important factors that managers and developers must consider when choosing a technology. The definition and scope of each factor is provided first, followed by a brief discussion of how examples of middleware technologies rank compared with each other on that factor.

* One definition of a component: a unit of software with a public, contractual interface and a hidden implementation. Components are often incapable of doing useful work by themselves, in which case they are relied upon by "master" applications that use components' services to do their job. However, components can also be larger grained, complete applications or systems.
  1. Performance

  2. The fastest programs have no communication with remote systems and are already compiled into assembly language native to the computer's platform. In contrast, a completely abstract middleware solution may run on many different computers in different nations on different platforms. Nothing may be precompiled, everything has long network latencies, and locating applications requires lengthy run-time delays (to lookup host IP addresses, interface details, etc.). Achieving a balance between these extremes - local and tightly bound components vs. remote and loosely bound - is the key to performance and several other factors.

    RPC based solutions have the best performance because they are closest to the fast model described above. CORBA follows closely after that, since components can be natively compiled; only communications between components require abstractions. (Each component with public interfaces has a corresponding stub, written in IDL, that tells other components how to interact with it. Locating the component itself also requires a lookup function, i.e., a directory service. Both of these capabilities require run-time work and communications between systems.)

    DCOM and Java are slower, all things being equal, than CORBA or RPCs. DCOM wraps functionality in separate programs (DLLs) and Java has an entire virtual machine operating between the code and assembly language, which slows execution speed of all code. However, implementation decisions have a dramatic effect on speed, and so this ranking of performance will vary. Message passing systems such as MQS are selected for asynchronicity rather than speed, though they may perform very fast given favorable network and CPU configurations and load.

  3. Platforms

  4. The type of computers used by a company are typically predetermined by existing equipment or dictated by budgetary constraints. Most middleware technologies distinguish between only three basic platforms: Windows, Unix, and IBM mainframe. Java provides a "write once, test everywhere, run most places" paradigm. Since Java compiles into bytecode that is then interpreted by each machine's native Java Virtual Machine, the same bytecode is (theoretically) executable without change on any supported computer - which includes Unix and Windows computers. CORBA enables component interactions between nearly any system, since it provides an IDL that abstracts the interface from its underlying implementation regardless of language, operating system, or hardware. MQS supports over 35 different platforms, including mainframe systems (unlike Java), and so provides a key means of data marshalling with many legacy systems. DCOM runs specifically on Microsoft operating systems.

  5. Development Ease

  6. Most middleware products provide their own APIs. The granularity and availability of API functions and the number of additional, non-API layers required to implement the technology determine development ease.

    Pure RPC is the most difficult, as it involves direct coding of all details (often in C), with no APIs provided. Distributed Computing Environment (DCE) standards provide specifications for forming a common infrastructure for the development of distributed systems. DCE is a layer above RPC, simplifying RPC work. Neither RPC nor DCE is a product that can be purchased - merely a description of the type of work done by developers (low level for DCE, and extremely low-level for RPC).

    CORBA is somewhat less difficult than either: it allows developers to tie together programs that are already written in many different languages. But implementing CORBA requires additional work to support added abstractions. These abstractions include the IDL and directory services (so components can find each other), among others. DCOM may be easier than CORBA, since its components and interfaces are more tightly integrated (owing to its diminished cross-platform capabilities). Java RMI may be the easiest, as it provides extensive API support for virtually all common middleware functionality, and the virtual machine concept enables true platform independence.

  7. Reliability

  8. Reliability refers to how well the code behaves as expected. Mission critical systems, such as missile defense, will have higher reliability requirements than other systems, such as customer service inquiries into the status of a repair request. The highest reliability requires exponentially more development expense and effort; generally speaking, reliability and development ease are inversely related.

    RPC/DCE should be the most reliable, given enough programming time, testing, and expense, since it is the only solution that allows developers complete control over all aspects of an application. (However, this does not account for the complexity of the final product, which may be impossible to manage at such low levels of coding.) CORBA could be next in line for reliability, depending on the quality of the components that it connects. Java RMI and DCOM are generally considered among the most reliable of these solutions, though this matter is open to debate.

    MQS theoretically provides the highest possible reliability, but this is only for the message container (the queue manager and its queues) and for transmission of messages between queue managers - not for assuring the transmission of messages from applications to the queue managers or for assuring that the messages are processed once retrieved by an application. It's up to the MQS application logic - using APIs provided by IBM - to confirm completion of these tasks, so MQS cannot have its reliability compared to these other products.*

  9. Scalability

  10. The scalability of a product refers to how well it will likely support existing and future systems with little additional work.

    CORBA is the most scalable. Indeed, its reason for existence is scalability. It uses a supporting language called the Interface Definition Language (IDL), which allows a component written in any language (for which an IDL exists) to be connected with any other such language. MQS also offers extraordinary scalability, as it is supported on almost any platform. Note that writing MQS applications still requires facility in languages that have MQS APIs (including C, C++, Java, and COBOL, among others). Java virtual machines come standard with all modern web browsers, and Java RMI is simply another class with its own methods for making remote calls. However, applets are typically regarded as lightweight components, not enterprise-sized components. Direct RPC implementations are the least scalable of all middleware options, providing no inherent future growth capabilities beyond those provided by the developer's customcreated interfaces. XML is a special case concerning the scalability factor: once all the tags and rules for well-formedness and validity have been defined, it can be used anywhere by any organization in the same vertical market. Many standard XML specifications have already been developed, including a Geography Markup Language (GML [Lake et al., 1999]).

  11. Reusability

  12. Highly reusable code does not need to be rewritten for implementation in other projects. Generally the later the binding occurs between components (i.e., the looser components are integrated during runtime and compile time) the more reusable the components. RPC code, generally speaking, is not reusable; everything is tightly bound at compile time. DCOM objects, although not bound at run time, are almost as proprietary as RPC, since the objects are typically not generic enough for reuse, and they can't be inherited unless the source code is available (and it usually is not). Java RMI has no compile-time bindings; it is highly reusable. Java reusability is also significantly enhanced by "introspection" (the capability to probe into a component and discover the interface and its methods). In addition, Java supports inheritance, so additional classes can be based on existing classes. CORBA systems are only as reusable as their individual components; if one "component" is an Outage Management System (OMS) for an electric utility, developers may be able to connect and disconnect it from other components in the same * Paradoxically, reliability order in the above hierarchy (RPC/DCE, then CORBA, then Java and DCOM) would be inverted if it were possible to give the "same amount of development time" to each solution; Java and DCOM would be the most reliable, followed by CORBA, then RPC.

    company but not in other companies (due to proprietary data models). On the other hand, an OMS graphic user interface may transpose to other companies. Like CORBA, MQS also does not fit into the "reusable" hierarchy, but for a different reason: it is an independently operating product that works in isolation from all meaningful system components. The MQS applications themselves, which put and get messages from queues, are highly specific to the components they connect (much as an IDL interface is specific to a CORBA component). XML is infinitely reusable to the extent that the organization anticipated all the tags needed by the components being added or changed. Thus tag creation and production rules should focus on generic business processes rather than component interface needs, while still fulfilling the latter.

  13. Synchronous vs. Asynchronous

  14. Asynchronous systems are those which do not require real-time interfaces between components. For example, in an outage management system, it is not required that all SCADA events be immediately communicated to trigger dispatched crews and GIS GUI imagery; delay is acceptable due to such factors as network latency or a rebooting server, for example. Similarly, if a customer service unit of a company wishes to generate a mailing list of all customers connected to a certain substation's circuit breaker, this can be done on a CPU time-available basis rather than in real time. Message passing systems such as MQS are, by definition, asynchronous. Although synchronicity can be enforced in MQS to some extent, it is not the intended purpose of the product.* All other technologies mentioned in this article are synchronous to the extent that they are single threaded. A particular CORBA implementation, for example, typically has more than one thread running on a different process, and to that extent it could be called "asynchronous."

  15. Open vs. Proprietary

  16. Proprietary products cost money and tend to support fewer platforms (with some notable exceptions, such as MQS). The code is not released to the public, so it is generally only supported and extended by the company that designed it. Open products are usually free, support more platforms, have the code released publicly, and are supported by many user groups and other organizations. Some open products have published standards emerging from a single, recognized authority.

    * The Message Passing Interface (MPI) is also an asynchronous messaging system, but efficient MPI programs pass messages with "puts" and "gets" timed so closely - with blocking commands to force synchronization - that synchronicity is assured. The world's first teraflop "supercomputer" (trillions of floating point operations per second) was not a single computer at all, but a collection of 4,536 dual-processor personal computers connected together to form a parallel computer. This computer completed a large matrix calculation using MPI API calls. Note also that this message passing system used special network hardware (in December, 1996) to enable node-to-node bidirectional communication of 800MB/sec. Thus "remote" nodes seemed a lot less remote! Java is a classic example of an open middleware product. Java development environments and tutorials can be downloaded for free (see java.sun.com). The CORBA standards can also be downloaded for free (www.omg.org), although specific ORBs (Object Request Brokers) to implement the CORBA standards must be purchased. XML and XML parsers for various languages are freely available for download (www.xml.com). In contrast, MQSeries is a good example of a proprietary product. A typical license for a large enterprise will easily cost over $100K (or $1 million for several systems). IBM has MQS professionals available to help companies implement the technology, while no such official support exists for open source products.

  17. Factors Not Included

  18. Security, fault tolerance, and load balancing deserve a discussion of their own. Security refers to encryption, authentication, and verification. Fault tolerant systems gracefully handle failures in components, networks, or servers. Systems capable of load balancing will dynamically shift processing load to less-active computers. These are important factors for some enterprise systems and require implementing multiple technologies to affect a complete solution.
Conclusion
An ideal middleware solution will achieve a balance among all of the above factors given real-world constraints such as legacy systems, available platforms, development talent, and budgetary resources. The hardest question to answer in any large development project is, "How much money should we spend to save money in the future?" Realistically, middleware is often used to cobble together new and old technologies in order to service immediate needs as cheaply as possible. Perhaps the best implementation one can hope for fills these obligations while also investing just a little more time and money to build systems that will be scalable to the company's future needs.

Future competitiveness of GIS-consulting companies and their supporting business partners may depend on the facility with which their products can be componentized and connected to legacy systems. Equally important is the capacity of such businesses to suggest appropriate middleware technologies that enable these connections while anticipating inevitable component changes and additions.

References
© GISdevelopment.net. All rights reserved.