High availability GIS: Beyond the application to the operating environment
Don Brady
Compaq Computer Corporation
200 Forest Street MRO1-3/K14, Marlboro, MA 01752, USA
don.brady@compaq.com
Today every business relies on “place” information. Immediate or real-time access to
place information is essential, either because the spatial data itself is critical, or because
the spatial information is a tightly integrated component of a broader mission-critical
application. Utility companies need to locate the source of a power failure when
customers report an outage; a package delivery franchise needs to optimize delivery
routes. What will be the effect on those businesses if their data server hangs as a result of
too much load? Or if a system disk crashes?
As organizations spatially enable core applications, and integrate large – sometimes tens
of terabytes! – enterprise databases housing both spatial and tabular data, the domain of
mission-critical applications expands to spatial applications. With these changes come
several major challenges: GIS data and applications are being treated as a corporate
resource, just like more traditional IT implementations; and spatial applications are now
commonly subjected to many of the same design principles as traditional enterprise-wide,
mission-critical applications. But computers can fail: a High Availability GIS solution
ensures that spatial data remains available and spatial applications continue running, even
during prolonged hardware failure.
Standalone computer systems typically can achieve 98 to 99 percent “uptime” – about
three and a half to seven days downtime per year -- which for non-critical computing
environments is generally acceptable. But mission-critical, can’t-do-business-withoutmy-
computer-system environments can tolerate no more than a few hours per year of
downtime, and in intervals of no more than a minute or so per instance. This is the
essence of High Availability, and a primary concern of core business operations: 99.9
percent uptime, and downtime lasting for not more than a few seconds to a minute at a
time.
It is important to note that “High Availability” does not mean “Fault Tolerant”. A High
Availability solution in fact tolerates hardware and software failure; but through proper
planning and contingency measures it provides quick and seamless recovery to such
failure without incurring the added cost of a fault tolerant implementation. So even
though a particular component (i.e., server) of a computing environment may not be
available for an extended period, the entire GIS remains fully available to all users.
And such failure must be transparent to the clients: when a server fails, users continue
their work as if no problem occurred, even if their application was running on that server,
or their data was stored on that server. An implication here is that some other server must
substitute for the one that failed. Further, users should not even be aware that a problem
occurred, requiring that the substitution be swift. This requirement is what produces the
metric that inaccessibility of applications and data must not exceed a few seconds to a
minute at a time: anything longer would exceed the user’s threshold for system
performance, and consequently would not be transparent.
High Availability and the GIS environment
Large GIS implementations have evolved into prototypical client/server environments.
On the front end are the clients, or users, accessing applications and data. On the back
end are data servers, providing access to what are often very large sets of both spatial and
tabular data through a standard programming or user interface.
Data server hardware platforms are quite often large servers, usually running UNIX, or
perhaps Windows NT. These same hardware systems may also run the enterprise’s
applications, including the GIS, but increasingly the trend is toward the addition of a
middle tier, creating a three-tier client/server environment. In such a configuration, the
GIS as well as any of the other applications would be physically relocated to the middle
tier. Two important benefits of a three-tier configuration are that each system can be
scaled and tuned to most efficiently provide the type of service required of it (that is, for
example, one system tuned for database access, another for web-serving or GIS
applications); and functional applications can be run in a different operating environment
(UNIX, NT) than the database. In either model, data is stored centrally on a server
system. Clients access the data by making requests over the network to a program
(database manager) on the server.
Inherent to any application environment are various potential failures – the result of
hardware crashes, software faults, or environmental problems -- that can cause the
applications and the data to not be accessible to their users. They can occur on the client
systems, on the data or application servers, on the storage systems, or on the network. It
is incumbent on those implementing a GIS – or any mission-critical application – to
consider all potential causes of failure, assess their impact, and plan accordingly.
|