Version Management Revisited
Peter M. Batty
GE Smallworld
5600 Greenwood Plaza Blvd.
Englewood, CO 80111
Abstract
This paper discusses version management, which is now widely accepted in the industry as
an essential technique for managing the design lifecycle. All major vendors now claim to
have version management as part of their solution. Despite this, version management and the
long transaction problem that it solves are still not widely understood.
This paper summarizes the fundamentals of version management, and looks at different
approaches to implementing it: deep and shallow version management. It then goes on to
explain how basic version management, while an essential pre-requisite, is just the beginning
of a full solution for managing the design process. Other important issues that need to be
addressed include design versus as-built views, future views of the network, partial job
completion, jobs built on jobs, handling historical information, and support for detached
design work in the field. Each of these is explained and approaches to implementing them
are discussed.
The Long Transaction Problem
For a detailed discussion of the long transaction problem, see Newell and Easterfield, 1990,
and Newell and Batty, 1994.
The basic technical requirement for a long transaction is the ability to lay out a design - do
inserts, updates and deletes - in such a way that the changes being made are not visible to
other users of the system (until the design reaches a stage where it is appropriate to share it
with others). Since the user has in some sense a private copy of the data, concurrency control
needs to be addressed: what happens if two people want to update something in the same
area? In general, since these transactions can take a long elapsed time (weeks or months), it
is unacceptable to insist that all data in an area be locked. Therefore, most approaches use an
optimistic form of concurrency control in which data is not locked, but any conflicting
updates are identified and resolved at some point before the transaction is completed.
Checkout
Checkout has been the most commonly used approach to the long transaction problem, in
which a small geographic area is copied to a separate database or file where the work is done,
and changes are passed back to the master database later. This has a number of drawbacks.
The time taken to create the checked out dataset is often significant (minutes rather than
seconds in many cases). Since a restricted area is checked out, it is hard to run an analysis of
how the design affects a broader area of the network. With any reasonably sophisticated data
model, it can be very hard to determine exactly what data should be checked out – there are
many difficult issues regarding the handling of data that is related to objects that are
geographically within the selected area.
Version Management
Another approach to handling long transactions is to use version management. With this
approach it is possible to create different “versions” or “alternatives” of the database. Each
alternative is logically equivalent to a replica of the whole database; a user can make changes
within an alternative that are not seen by any other users, and the user in the alternative does
not see changes made by other users (until she asks to see them). All of the data is not
physically replicated to create an alternative - only the changes relative to the parent version
are stored in an alternative. There are significantly different approaches to implementing
version management, which will be discussed in the next section. Changes are propagated
between versions in a controlled way. This paper uses the terminology that a “merge”
operation propagates changes down from a parent to a child alternative, and a “post”
operation propagates changes up from a child to a parent. Beware though, as Oracle’s new
version management technology, known as workspace management, uses different
terminology: they use the term “refresh” for propagating changes down, and “merge” for
propagating changes up. In general, version management uses an optimistic approach to
concurrency control, and any conflicts are detected and corrected when a merge is done. It is
also possible to have a tree structure of alternatives, so in addition to handling simple long
transactions, this approach also provides a mechanism for handling alternative designs in an
elegant way. Version management overcomes all the problems with checkout mentioned
above: there is no initial retrieval time, no copying of data is required, and the user has
access to the whole database at all times.
While this technology has been available for 10 years now, only recently has the superiority
of this approach been widely acknowledged. It is now accepted as the industry standard
approach, with all the major GIS vendors and Oracle announcing support for version
management. Recent implementations of version management use a significantly different
underlying architecture than the longer established one, and the differences are discussed in
the following sections.