Scalability - The forgotten dimension
Data Access
Business applications that display spatial information require access to large amounts of data. This is due to the
unique nature of spatial data- multiple coordinate data used to describe a single object, and many such objects
required on screen to provide adequate context (Figure 1). In relational database terms, each object is a row in a
primary table; coordinate data and related non-spatial data are multiple rows in multiple indirect tables, which
are related using foreign keys (Figure 2).
The non-spatial data is typically used to dynamically drive the rendering. For example, the type of cable might
determine the color, the terminal types might control the symbols used at each end of the displayed cable, and
the assignment of strands could be labeled midway along the cable (Figure 2).
Contrast this with the typical administrative application (Figure 3), in which each screen of information consists
of perhaps fifty pieces of data that collectively represent one primary row and perhaps several rows from a
couple of related tables.
The need for large amounts of data creates two key bottlenecks: access load on the database server and
transmission across the network. Because so much data is required for displaying spatial information, graphics
caching becomes a very effective solution. In this approach, a distinction is made between a database object and
its displayed appearance.
An object such as a cable, for example, would be rendered using eight graphic primitives: two symbols, one line
joining the symbols, and five pieces of displayed text (Figure 2). The display engine records these primitives in
a local display list structure.
Each primitive is tagged with the same key that uniquely identifies the cable object in the database. In this way,
all the primitives can be treated as a unit. When the user clicks any one, the display engine 'tells' the application
the object identifier. The application can obtain the attributes for the cable object by going back to the database
with the following request:
SELECT * FROM CABLE WHERE CABLE_ID=&selected_object_id;
The basis for the caching is the storage of this display list of graphics primitives in a file that can be retrieved
and displayed later without going back to the database.
This approach is viable as long as
- The display engine is capable of using a local display cache
- The database keys are persistent (thereby not retrieving data for a reassigned key)
- The display caches record the database keys
- The cache handles display only and is not a data cache; otherwise, corruption of the database can occur
(When a user operates on [as opposed to displays] data, the application always goes to the database.)
The approach becomes effective by
- Being able to update 'stale' caches (This requires a process that monitors changes posted to the database
and duly updates the affected caches.)
- Tracking which application/user has which caches and therefore identifying a possible 'stale' cache that
needs refreshing
- Generating caches systematically for both vertical and horizontal partitioning (Vertical partitioning is
constructed as themes-for example, street outlines in one layer and property boundaries in another.
Horizontal partitioning is tiling the geographic area into smaller units [Figure 4].)
Application architecture
An application's architecture must recognize the finite capacity of both the network and the server. The
underlying software products must support task decomposition, which includes both processing and data access.
Often called multi-tiered architecture, partitioning allows the processing load to be distributed among multiple
platforms. The most common approach involves three tiers: database access, application logic, and user
interface. The evolution to distributed objects is providing infinite tiering. The sharing of processing load
between client and server is no longer dictated by the hardware configuration, but by current application needs
and architecture. In techno-jargon, we could say that clients and servers are not permanently 'fat' or 'thin,' but
can change their profile according to job requirements. For example, if geographic survey data is used only for
background reference, it could be most efficient to store it on a networked drive on each regional LAN rather
than continually sending it out across the WAN from a central database.
An excellent example is British Telecom's Plant and Records Modernization (BT-PRM) project. Most utility
solutions must display large amounts of data as graphics, but always having to 'draw from the database'
precludes scalability. In the case of BT-PRM, the data for the entire country is held in one central database.
Engineers located in the various regional offices need timely online access to the data. The infrastructure is a
national WAN connected to regional office LANs. Scalability is achieved through a combination of application
partitioning and display cache management (Figure 5).
At any time, hundreds of users can be working with the data, and several users can be working on data for the
same geographic area. The central server does not attempt to notify every user immediately about every change
in the national data. Instead, it keeps track of who is working where, and notifies users only of changes to data
in their current area of operations. This common-sense 'need to know' strategy saves CPU time and makes such
a large-scale solution feasible. The solution also avoids the messy and expensive administration task of trying to
synchronize distributed databases that anyone can change.