|
|
|
System Architecture
|
Using middleware for GIS integration and factors for evaluating technologies
Overview of some middleware technologies
Some common characteristics of all middleware technologies provide a context for
understanding those technologies. Since communication is the heart of middleware,
communication methods are discussed. Then we review the way that communication is
implemented - whether by a programming language or other means. Finally, we
introduce one possible form of each communication's contents: XML.
- Communication Methods
Components can communicate through (at least) one of four means: sockets, remote
procedure calls, remote method invocation, and messages. Sockets have been used for
decades, and are the lowest level means of connecting one computer to another (Comer &
Stevens, 1996). A client and server must encode and decode data through a socket
stream, so a great deal of programming effort goes into protocol specification. Using
sockets often requires a large amount of programming effort, and requires dealing
explicitly with such complexities as multithreading, deadlock, synchronization, and
network problems. Middleware technology will be easier to implement to the degree that
these details are hidden.
Remote procedure calls (RPC) refer to the ability for function A to call upon function B
as if B was local, even if it's not. An ideal RPC product would completely hide all
socket-level details - effectively "wrapping" the RPC calls in functions or methods.
Remote method invocations (RMI) are analogous to RPCs, but apply to objects
(java.sun.com, 1999). An object, which is an instance of a class, has both data and
methods (which are functions that access the data).
Message exchange is the last major communication method used by middleware
technologies. While RPCs and RMIs focus on calling remote functions or objects,
message exchange only concerns data transfer between components. Message oriented
middleware (MOM) works much like electronic mail, using store-and-forward queuing
provided by a shrink-wrapped product that runs separately from the components
themselves. The receiving component "knows" what to do with a message once it's
received, and the sending component "knows" how to package that data so the receiver
understands it. Two MOM products include IBM's MQS (Message Queuing Series) and
MSMQ (Microsoft Message Queuing). The theory behind both products is nearly
identical, but MQS works on many more platforms than MSMQ (which is for Windows
computers only [Lewis, 2000]), so we only discuss MQS.
Programming Languages and Interface Standards
Various programming languages, and middleware technologies that wrap languages,
make good choices for a middleware solution. A few dozen languages advertise
themselves as middleware enablers, and Java is the most dominant.
Distributed object systems such as Microsoft's DCOM (Distributed Component Object
Model [Sessions, 1998]) and OMG's CORBA (Object Management Group's Common
Object Request Broker [Siegel, 2000; corba.org, 2000]) provide standard interfaces for
applications to register, discover, and use components*. CORBA stubs "wrap" programs
written in other languages with a pseudo-language called IDL (interface definition
language), while DCOM provides components with a Microsoft-specific interface
definition that subsequently allows other consumer programs to use COM objects. Both
DCOM and CORBA protect developers from socket details but don't inherently offer
error recovery.
- Extensible Markup Language (XML)
The Standard General Markup Language (SGML) and its derivatives - such as the
eXtensible Markup Language (XML) and Hypertext Markup Language (HTML) -
provide a means for different computers to understand, parse, or format text streams
based on standardized "tags" that are embedded in the text. XML is particularly notable
for its flexibility: developers can create their own tags and rules for well-formedness and
validity (Walsh, 1998). XML is discussed below because of its extensibility and
usefulness for providing self-describing data to components regardless of the other
middleware technologies being used.
Middleware factors to consider
The communication means, implementation, and context summarized above are
implemented using many different middleware technologies. These technologies vary
with respect to several important factors that managers and developers must consider
when choosing a technology. The definition and scope of each factor is provided first,
followed by a brief discussion of how examples of middleware technologies rank
compared with each other on that factor.
* One definition of a component: a unit of software with a public, contractual interface
and a hidden implementation. Components are often incapable of doing useful work by
themselves, in which case they are relied upon by "master" applications that use
components' services to do their job. However, components can also be larger grained,
complete applications or systems.
- Performance
The fastest programs have no communication with remote systems and are already
compiled into assembly language native to the computer's platform. In contrast, a
completely abstract middleware solution may run on many different computers in
different nations on different platforms. Nothing may be precompiled, everything has
long network latencies, and locating applications requires lengthy run-time delays (to
lookup host IP addresses, interface details, etc.). Achieving a balance between these
extremes - local and tightly bound components vs. remote and loosely bound - is the key
to performance and several other factors.
RPC based solutions have the best performance because they are closest to the fast model
described above. CORBA follows closely after that, since components can be natively
compiled; only communications between components require abstractions. (Each
component with public interfaces has a corresponding stub, written in IDL, that tells
other components how to interact with it. Locating the component itself also requires a
lookup function, i.e., a directory service. Both of these capabilities require run-time work
and communications between systems.)
DCOM and Java are slower, all things being equal, than CORBA or RPCs. DCOM
wraps functionality in separate programs (DLLs) and Java has an entire virtual machine
operating between the code and assembly language, which slows execution speed of all
code. However, implementation decisions have a dramatic effect on speed, and so this
ranking of performance will vary. Message passing systems such as MQS are selected
for asynchronicity rather than speed, though they may perform very fast given favorable
network and CPU configurations and load.
- Platforms
The type of computers used by a company are typically predetermined by existing
equipment or dictated by budgetary constraints. Most middleware technologies
distinguish between only three basic platforms: Windows, Unix, and IBM mainframe.
Java provides a "write once, test everywhere, run most places" paradigm. Since Java
compiles into bytecode that is then interpreted by each machine's native Java Virtual
Machine, the same bytecode is (theoretically) executable without change on any
supported computer - which includes Unix and Windows computers. CORBA enables
component interactions between nearly any system, since it provides an IDL that
abstracts the interface from its underlying implementation regardless of language,
operating system, or hardware. MQS supports over 35 different platforms, including
mainframe systems (unlike Java), and so provides a key means of data marshalling with
many legacy systems. DCOM runs specifically on Microsoft operating systems.
- Development Ease
Most middleware products provide their own APIs. The granularity and availability of
API functions and the number of additional, non-API layers required to implement the
technology determine development ease.
Pure RPC is the most difficult, as it involves direct coding of all details (often in C), with
no APIs provided. Distributed Computing Environment (DCE) standards provide
specifications for forming a common infrastructure for the development of distributed
systems. DCE is a layer above RPC, simplifying RPC work. Neither RPC nor DCE is a
product that can be purchased - merely a description of the type of work done by
developers (low level for DCE, and extremely low-level for RPC).
CORBA is somewhat less difficult than either: it allows developers to tie together
programs that are already written in many different languages. But implementing
CORBA requires additional work to support added abstractions. These abstractions
include the IDL and directory services (so components can find each other), among
others. DCOM may be easier than CORBA, since its components and interfaces are
more tightly integrated (owing to its diminished cross-platform capabilities). Java RMI
may be the easiest, as it provides extensive API support for virtually all common
middleware functionality, and the virtual machine concept enables true platform
independence.
|
|
|
|