Scalability - The forgotten dimension
Dan Lemkow
Director, Project Management
VISION* Solutions
50 O'Connor Street, Suite 501
Ottawa, ON K1 P 6L2, Canada
PH: (613) 236-9734, ext. 1694,
FAX: (613) 567-5433
E-Mail: dan@gis.shl.com
Introduction
Ignored in system design, then overlooked in operation, scalability is often not appreciated in an online
transaction processing (OLTP) environment until the system becomes overloaded, unstable, and inefficient. In
hindsight, system managers wish they had designed for scalability, which is the ability to add users without
adversely affecting system performance for any one user. Performance is a measure of the user's ability to
complete tasks in a timely manner.
Why is scalability so often forgotten? Sometimes it's because systems start off with a few users and fail to
anticipate the effects of growth. Also, scalability is difficult to measure and predict: many administrators simply
buy a big server with lots of capacity, and assume that they can expand it if necessary. Unfortunately, the
combination of a huge database with many distributed users and multiple concurrent transactions creates a
complex, dynamic structure that may not respond predictably to changes. When the system gets overloaded, it is
not always feasible to simply acquire faster networks, more servers, and more system administrators to cope
with recoveries.
The first step along the path of successfully managing this challenge is to understand the factors that affect
scalability. This paper identifies these factors and proposes some methods of building scalability into a system
as a permanent feature.
System Administration
In a perfect world, systems would never go down, software would be defect-free, and databases would never be
corrupted. Reality is quite different. These risks must be managed. Increasing the number of users increases the
significance and complexity of coping with such problems. For a system to be scalable, the system
administration must be responsive. A system will not be scalable if it requires an army of specialized database
administrators, if it is not operational when backups are being done, or if it cannot guarantee transaction
integrity. Consider the typical GIS technology that uses a hybrid data store: spatial data in a proprietary
database and non-spatial data in a commercial database. If a system failure occurs during a transaction that
involves both spatial and non-spatial data, will the database be left in an incomplete state?
The system administration solution is to fully leverage commercial databases. Database vendors specialize in
tackling this problem by
-
Supporting a variety of network standards, CPU platforms, and monitoring tools
- Providing robust backup and restore facilities, including both hot and cold backups
- Preserving transaction integrity so that incomplete transactions can be rolled back
- Using a variety of access tools at both the client and server ends
It is essential that spatial information systems fully exploit these technologies. All the data- geography,
relationships, and attributes-must be stored in the commercial database in an accessible and transparent
manner. If data can be accessed both from the spatial information system and through the standard access tools
of the DBMS application, processing can occur on the server using techniques such as stored procedures and
triggers. This results in a richer set of options for application partitioning.