|
|
|
System Architecture
|
Practical Object Versioning for Distributed Mobile Databases
“View”
A view is similar to a baseline in that it is a collection of rules used to derive a list of object versions.
Unlike a baseline, a view does not represent a special state of the system, and the contents of a view
can change from moment to moment as new object versions are created.
The most important view is often referred to as “main-latest”, meaning “the most recent version on the
main branch of every object in the system”. In the SCM context this is the cutting edge of code that the
developers are working on. In the distributed application context, this is the best consensus state of
the data. A distributed database application will probably enforce rules to ensure that the data visible
through the “main-latest” view goes from one valid state to another in a transactionally-secure way.
Good SCM practice endeavors to do this, but proving that the source code is “valid” may involve
building, deploying and running the code, so automated SCM tools have to be less rigorous in this
area.
For a distributed database application, other views can be defined which show the state of the data as
it appears on a specific remote device. This is particularly useful when designing an architecture that
must provide a consistent user experience across both thick and thin client devices, since a
web/application server can show each individual user slightly different data which matches what they
see when working disconnected on a device with local data storage.
“Merge”
Where branching provides a way of splitting the “version-lifeline” of an object, merging enables the
separate branches to be brought back together again. This is best illustrated with an example from a
fictitious SCM scenario:
When AcmeJumble 2.2 Beta 1 was released, “widget.java” was at version 1.3. Two minor bug fixes were
made and issued to selected customers as patches – these produced “widget.java” versions 1.3.1.0 and
1.3.1.1. Meanwhile the main development team were adding a new feature which the product managers
insisted on including in the final public release of AcmeJumble 2.2 - this produced version 1.4 of
“widget.java”. The beta period has now finished, and the bug fixes that were applied need to be merged
back into the main branch so that the developers can finish work on AcmeJumble 2.2 Beta 2.

Figure 4 shows the situation graphically.
It turns out that an automated SCM tool can do a surprisingly good job of merging these different
versions of “widget.java” back together. The details of this operation are described below, and are key
to understanding how the SCM versioning model improves upon “replication” or “synchronization” for
reconciling changes in a distributed database.
- Using the directed graph that links the versions, find the most recent common ancestor of the two versions to be merged – this turns out to be version 1.3.
- Compute the differences between version 1.3.1.1 and the common ancestor, 1.3.
- Compute the differences between version 1.4 and the common ancestor, 1.3.
- If these two sets of differences overlap, then there are conflicting changes – ask the user to choose, interactively.
- In this case there are no overlaps, so create version 1.5 automatically by taking version 1.4 and applying the (V1.3.1.1 – V1.3) differences already computed.
For program source files, a “difference” is usually defined as one or more adjacent lines of text which
differ from the corresponding region of a second file by something more significant than whitespace
characters (space, tab, carriage-return, etc.). For structured data, differences will be found at the level
of individual object attributes, to which the above algorithm can be more easily applied.
Replication/Synchronization versus SCM Merge
To push this point a little further, imagine you only have “widget.java” versions 1.3.1.1 and 1.4, and
wish to “synchronize” them – perhaps 1.3.1.1 was created while working in the field on a laptop, and
1.4 on a desktop PC. If we compare them, there will be five differences, {a,b,c,d,e}. From timestamp
information we also know that 1.3.1.1 is the more recent version. Without access to their common
ancestor, there is simply no way that we can figure out whether to choose text from 1.3.1.1 or 1.4 at
each of the five ambiguous regions of the file. At best we can adopt a “last one in, wins” approach and
discard the work that went into version 1.4, or interactively offer the user both versions and invite them
to sort the problem out manually. The interactive approach works well for two-way synchronization of
personal data on a PDA with a desktop PC, but is not appropriate for a high-volume enterprise-wide
distributed database application.
Most industrial-strength database replication mechanisms suffer from exactly the same problem. The
word “replication” itself provides a clue about the sort of tasks this functionality was designed for –
making replicas of a database to act as a backup, hot standby, or perhaps a local cache for improved
performance. It may be possible to configure how frequently a replication process executes and what
subset of data gets replicated, but when it comes to conflict resolution the choice is usually to treat
one end of the link as the “master”, or to write all conflicts to an error table. Retaining some portion of
the version history of objects/rows is the key to improving on this situation, and at the time of writing
there do not appear to be any mainstream RDBMS products which do anything “out of the box” to
support this.
Instead of program source code, imagine a database of more structured information appropriate to
field working – asset catalogs, construction crews, repair jobs, GIS data, and so on. When an office
worker changes one attribute of a particular asset record, for example, and a field worker changes
another on a disconnected mobile device, there is no reason why tried and trusted SCM principles
should not be used to automatically merge the changes. This keeps the enterprise data accurate and
up to date, and allows user interaction to be focused where it is needed – on resolving overlapping
edits which are truly unreconcilable.
Practical architecture using SCM concepts
Figure 5 is a simple outline architecture for a distributed enterprise database application where the
mobile devices can operate as both thick and thin clients, and all persistent data is stored as
versioned objects.

Figure 5
Message-Oriented Middleware
Message-oriented middleware (MOM) in the form of store and forward message queues is an ideal
technology for propagating changes between distributed databases. It provides storage for queuing up
messages which cannot be delivered while the mobile device is disconnected, and all non-trivial
implementations will deliver messages in a well-defined order. Some MOM products can also
participate in distributed transactions using a two-phase commit protocol, which ensures that
messages are never lost as they cross from one system to another.
A major advantage of MOM is its simplicity – as a consequence, implementations tend to be robust.
Arguably the biggest disadvantage is the asynchronous nature of a message – by the time you receive
it, the world may have moved on and the data in the message is out of date. The ability of a distributed
database that uses versioned objects to refer explicitly and precisely to past system states therefore
dovetails very neatly with MOM. Transactions can be effectively implemented by collecting a set of
object versions into a single message which is then delivered by the MOM and applied to a remote
database as an atomic unit.
Conclusion
Taking the step from relational or even object-oriented data to a distributed database of fully versioned
objects is not a small one, and the costs in terms of development time, data volumes and application
performance should not be underestimated. There are commercial OODBMS products with some level
of object versioning support which may provide a useful foundation, but at the time of writing there are
no fully satisfactory commercial off-the-shelf products available.
There are two particular functionality requirements for which a versioned database is a good technical
solution:
- Different users in different places often edit the same data, and the machines they work on are not
always connected over some kind of network
- All users need to be able to see up-to-date data incorporating changes made by other physically
remote users no more than a few seconds after new data is received.
If both of these are high up on your list of product requirements (and probably not otherwise), then you
should think seriously about designing a system around versioned objects.
References
The following books provide a good starting point and both contain excellent bibliographies:
Leon, A., 2000, A Guide to Software Configuration Management
Meyer, B., 1997, Object-Oriented Software Construction (Second Edition)
There is an interesting comparative review of OODBMS products at:
http://www.dacs.dtic.mil/techs/oodbms2/oodbms2.pdf
In a world increasingly awash with data, versioning features are beginning to appear independently in
many new contexts outside of software development, such as the “WebDAV” project (“Web-based
Distributed Authoring and Versioning”):
http://www.webdav.org/
|
|
|
|