Logo GISdevelopment.net

GISdevelopment > Proceedings > GITA > 2001


GITA 2002 | GITA 2001 | GITA 2000 | GITA 1999 | GITA 1998 | GITA 1997 |  
Sessions

A tangled web of pure opportunity

Directions for data

Forging the future

How they did it - and what's next

Integrating work management

Mobile solutions- taking it to the streets

Operations support

People make the difference

Systems architecture

The local government perspective

Tying IT all together

Vertical applications


GITA 2001


System Architecture


Practical Object Versioning for Distributed Mobile Databases


“View”
A view is similar to a baseline in that it is a collection of rules used to derive a list of object versions. Unlike a baseline, a view does not represent a special state of the system, and the contents of a view can change from moment to moment as new object versions are created.

The most important view is often referred to as “main-latest”, meaning “the most recent version on the main branch of every object in the system”. In the SCM context this is the cutting edge of code that the developers are working on. In the distributed application context, this is the best consensus state of the data. A distributed database application will probably enforce rules to ensure that the data visible through the “main-latest” view goes from one valid state to another in a transactionally-secure way. Good SCM practice endeavors to do this, but proving that the source code is “valid” may involve building, deploying and running the code, so automated SCM tools have to be less rigorous in this area.

For a distributed database application, other views can be defined which show the state of the data as it appears on a specific remote device. This is particularly useful when designing an architecture that must provide a consistent user experience across both thick and thin client devices, since a web/application server can show each individual user slightly different data which matches what they see when working disconnected on a device with local data storage.

“Merge”
Where branching provides a way of splitting the “version-lifeline” of an object, merging enables the separate branches to be brought back together again. This is best illustrated with an example from a fictitious SCM scenario:

When AcmeJumble 2.2 Beta 1 was released, “widget.java” was at version 1.3. Two minor bug fixes were made and issued to selected customers as patches – these produced “widget.java” versions 1.3.1.0 and 1.3.1.1. Meanwhile the main development team were adding a new feature which the product managers insisted on including in the final public release of AcmeJumble 2.2 - this produced version 1.4 of “widget.java”. The beta period has now finished, and the bug fixes that were applied need to be merged back into the main branch so that the developers can finish work on AcmeJumble 2.2 Beta 2.
Figure 4 shows the situation graphically.

It turns out that an automated SCM tool can do a surprisingly good job of merging these different versions of “widget.java” back together. The details of this operation are described below, and are key to understanding how the SCM versioning model improves upon “replication” or “synchronization” for reconciling changes in a distributed database.
  • Using the directed graph that links the versions, find the most recent common ancestor of the two versions to be merged – this turns out to be version 1.3.
  • Compute the differences between version 1.3.1.1 and the common ancestor, 1.3.
  • Compute the differences between version 1.4 and the common ancestor, 1.3.
  • If these two sets of differences overlap, then there are conflicting changes – ask the user to choose, interactively.
  • In this case there are no overlaps, so create version 1.5 automatically by taking version 1.4 and applying the (V1.3.1.1 – V1.3) differences already computed.
For program source files, a “difference” is usually defined as one or more adjacent lines of text which differ from the corresponding region of a second file by something more significant than whitespace characters (space, tab, carriage-return, etc.). For structured data, differences will be found at the level of individual object attributes, to which the above algorithm can be more easily applied.

Replication/Synchronization versus SCM Merge
To push this point a little further, imagine you only have “widget.java” versions 1.3.1.1 and 1.4, and wish to “synchronize” them – perhaps 1.3.1.1 was created while working in the field on a laptop, and 1.4 on a desktop PC. If we compare them, there will be five differences, {a,b,c,d,e}. From timestamp information we also know that 1.3.1.1 is the more recent version. Without access to their common ancestor, there is simply no way that we can figure out whether to choose text from 1.3.1.1 or 1.4 at each of the five ambiguous regions of the file. At best we can adopt a “last one in, wins” approach and discard the work that went into version 1.4, or interactively offer the user both versions and invite them to sort the problem out manually. The interactive approach works well for two-way synchronization of personal data on a PDA with a desktop PC, but is not appropriate for a high-volume enterprise-wide distributed database application.

Most industrial-strength database replication mechanisms suffer from exactly the same problem. The word “replication” itself provides a clue about the sort of tasks this functionality was designed for – making replicas of a database to act as a backup, hot standby, or perhaps a local cache for improved performance. It may be possible to configure how frequently a replication process executes and what subset of data gets replicated, but when it comes to conflict resolution the choice is usually to treat one end of the link as the “master”, or to write all conflicts to an error table. Retaining some portion of the version history of objects/rows is the key to improving on this situation, and at the time of writing there do not appear to be any mainstream RDBMS products which do anything “out of the box” to support this.

Instead of program source code, imagine a database of more structured information appropriate to field working – asset catalogs, construction crews, repair jobs, GIS data, and so on. When an office worker changes one attribute of a particular asset record, for example, and a field worker changes another on a disconnected mobile device, there is no reason why tried and trusted SCM principles should not be used to automatically merge the changes. This keeps the enterprise data accurate and up to date, and allows user interaction to be focused where it is needed – on resolving overlapping edits which are truly unreconcilable.

Practical architecture using SCM concepts
Figure 5 is a simple outline architecture for a distributed enterprise database application where the mobile devices can operate as both thick and thin clients, and all persistent data is stored as versioned objects.


Figure 5

Message-Oriented Middleware
Message-oriented middleware (MOM) in the form of store and forward message queues is an ideal technology for propagating changes between distributed databases. It provides storage for queuing up messages which cannot be delivered while the mobile device is disconnected, and all non-trivial implementations will deliver messages in a well-defined order. Some MOM products can also participate in distributed transactions using a two-phase commit protocol, which ensures that messages are never lost as they cross from one system to another.

A major advantage of MOM is its simplicity – as a consequence, implementations tend to be robust. Arguably the biggest disadvantage is the asynchronous nature of a message – by the time you receive it, the world may have moved on and the data in the message is out of date. The ability of a distributed database that uses versioned objects to refer explicitly and precisely to past system states therefore dovetails very neatly with MOM. Transactions can be effectively implemented by collecting a set of object versions into a single message which is then delivered by the MOM and applied to a remote database as an atomic unit.

Conclusion
Taking the step from relational or even object-oriented data to a distributed database of fully versioned objects is not a small one, and the costs in terms of development time, data volumes and application performance should not be underestimated. There are commercial OODBMS products with some level of object versioning support which may provide a useful foundation, but at the time of writing there are no fully satisfactory commercial off-the-shelf products available.

There are two particular functionality requirements for which a versioned database is a good technical solution:
  • Different users in different places often edit the same data, and the machines they work on are not always connected over some kind of network
  • All users need to be able to see up-to-date data incorporating changes made by other physically remote users no more than a few seconds after new data is received.
If both of these are high up on your list of product requirements (and probably not otherwise), then you should think seriously about designing a system around versioned objects.

References
The following books provide a good starting point and both contain excellent bibliographies: Leon, A., 2000, A Guide to Software Configuration Management
Meyer, B., 1997, Object-Oriented Software Construction (Second Edition)
There is an interesting comparative review of OODBMS products at: http://www.dacs.dtic.mil/techs/oodbms2/oodbms2.pdf  In a world increasingly awash with data, versioning features are beginning to appear independently in many new contexts outside of software development, such as the “WebDAV” project (“Web-based Distributed Authoring and Versioning”): http://www.webdav.org/
Page 2 of 2
| Previous |

Applications | Technology | Policy | History | News | Tenders | Events | Interviews | Career | Companies | Country Pages | Books | Publications | Education | Glossary | Tutorials | Downloads | Site Map | Subscribe | GIS@development Magazine | Updates | Guest Book