Practical Object Versioning for Distributed Mobile Databases
Robert Wills
Severn Trent Systems, Alexander House
19 Fleming Way, Swindon SN1 2NG, United Kingdom
Extending the Enterprise Database to Mobile Devices
Distributing enterprise data to mobile devices for use by field personnel makes good business sense
for a wide variety of applications. This is most easily achieved with a “thin client” architecture, where
the data and application logic resides in the controlled environment of a central data center, and only
the user interface executes on the mobile device. There are scenarios where the field worker must be
able to run applications when no connection to the data center is available. To achieve this, a subset
of the enterprise database and application code must be deployed to a “thick client” device and
executed locally.
When a field worker updates the database on their disconnected mobile device, the changes cannot
be immediately validated against data elsewhere in the enterprise. There needs to be some way of
reconciling the overlapping edits and other data inconsistencies that can occur. If the data volumes
are large, this reconciliation process needs to be automated wherever possible.
The discipline of Software Configuration Management (SCM) provides a solid theoretical framework
based around the concept of “versioned objects”, which can be usefully adapted to solve some of the
problems inherent in building a distributed enterprise-wide database.
Software configuration management (SCM) concepts
“Object”
Objects are collections of data which are treated as an atomic unit for the purposes of versioning. For
a SCM system, objects are typically text files containing program source code. Any non-trivial software
project will involve a large number of source files, which are usually organized into folders. Figure 1a
shows a UML class model which describes a hierarchy of folders containing files - the essential
concept behind most computer storage at the operating system level.
More complex associations and dependencies between these files are implied by the source code
they contain, although it requires an appropriate compiler to figure them out. Some software
development environments with SCM features make these associations more explicit by storing and
versioning fragments of source code at a more granular level. Figure 1b is a simplistic model of part of
the Java language, illustrating some typical fragments.
Formal programming language definitions and the full UML metamodel allow this process of
decomposition to be taken much further, modeling the building blocks of individual executable
statements and expressions. This level of analysis adds little value to a SCM process, and current
SCM tools rarely have an object model more complex than figure 1a above.
The principles of object versioning can be usefully applied outside the SCM context to any model, no
matter how simple or complex. The objects being versioned can be traditional RDBMS table rows just
as easily as program source files.
Object identity is an important issue for any system that supports versioning. The RDBMS approach is
to identify an object by some of its attributes, known as its “primary key”. An OODBMS identifies
objects with some sort of extra reference or “handle”, which might be a memory address or a number
which acts as a unique object identifier (“OID”). For a distributed, version-aware application there are
advantages to the object-oriented approach, since it allows two objects created on different remote
systems but with the same primary key to exist in the same distributed database. This situation implies
some form of conflict which will eventually need resolving, but the system must have somewhere to
store the two objects in the meantime.
“Version”
When an object is changed, an SCM system does not throw away the old state of the data. Each state
is called a “version”, and is given some form of unique version identifier (“VID”). More importantly, the
SCM keeps a record of what previous version the new data is based upon, creating a directed graph
of the versions of a particular object. Other housekeeping information is typically stored with each
version, such as the name of the user making the change, a timestamp, and comments about why the
change was made.

Figure 2
An important property of object versions is their stability. Once created, the contents of an object
version should not change for the entire lifetime of the SCM project. Change always implies creating a
new version.
Simplistic SCM systems can only store multiple versions of file objects. More practical SCM systems
allow both folders and files to be versioned, and this principle can be extended to a full application
data model, where some or all of the different object types are versioned.
The existence of versions gives rise to two different types of object reference. If the example above is
part of the version history of a file called “collections.cpp”, then a version-specific reference which
specifies “collections.cpp, version 2.8”, will always resolve to the same piece of data. A “bill of
materials” describing the contents of a software release is one context where the stability of a versionspecific
reference is very useful. If a second source file contains the statement
#include “collections.cpp” then it is making a generic reference to the collections.cpp file,
relying on some other mechanism, such as a “view”, to select a specific version.
“Baseline” or “Stripe”
In SCM terms, a “baseline” or “stripe” is something that describes a snapshot of the system, and can
be used to reconstruct that snapshot later. The snapshot might represent the source files used to build
a particular beta release, for example. Physically, the baseline consists of a list of version-specific
object references or a rule which can be used to derive the list.
Baselines are typically used to record stable, consistent states that the objects within the system
achieve as a group. In the SCM context this might mean that the source files build successfully, or that
the resulting executable code does something useful. In the version-aware application context it could
mean that all referential integrity and other validation constraints are fulfilled. Baselines can be used
as part of the implementation of a “long transaction”, where they act in a similar way to a SQL
Savepoint.
“Check-in”, “Check-out”
A developer using an SCM system is often required to “check-out” a file before making any changes to
it. This applies a lock to that particular version of the file which prevents any other developers from
starting work on it, until it has been “checked-in” again. Most SCM systems can also be configured to
only lock object versions during the brief period when a developer’s changes are being merged back
to the code repository. These two approaches correspond closely to the “pessimistic” and “optimistic”
locking strategies available in most RDBMS products.
In a distributed database with intermittent connections, a pessimistic lock or “check-out” may take a
long time to complete. It requires an exchange of messages with the central datastore, or other lock
manager which is not possible while the device is disconnected. A strategy of optimistic locking,
coupled with automated tools for merging changes when conflicts are detected later, is effective for
implementing most (but not all) application functionality requirements.
“Branch” or “Activity”
When looking at the directed graph of versions of one particular object, there may be occasions when
two or more versions exist at the same point in time, and are derived from the same predecessor, or
“parent”. Some SCM systems use a form of Dewey-decimal numbering to store an object’s version
graph within the names assigned to versions.
In the SCM context, branches occur in the version graphs of objects when different development
activities are running in parallel. A typical example is release bug-fixing, where one group of
developers make small, carefully-controlled fixes to a code baseline which has been released to
customers, while others are working on major new functionality for the next release. Most SCM tools
allow branches in individual object version graphs to be associated with a label representing the global
context, or “activity” for which they were created.

Figure 3
While the version label “1.3.1.0” is specific to the object “widget.java”, the branch name
“V2.2_BETA1_BUGFIX” is global and may involve changes to many objects.
In the context of a distributed mobile database application, global branches can be used to keep
changes applied on one mobile device separate from those applied on another. A branch might have
the title “changes made on device 21”, for example.
|