Logo GISdevelopment.net

GISdevelopment > Proceedings > GITA > 2001


GITA 2002 | GITA 2001 | GITA 2000 | GITA 1999 | GITA 1998 | GITA 1997 |  
Sessions

A tangled web of pure opportunity

Directions for data

Forging the future

How they did it - and what's next

Integrating work management

Mobile solutions- taking it to the streets

Operations support

People make the difference

Systems architecture

The local government perspective

Tying IT all together

Vertical applications


GITA 2001


System Architecture
Printer Friendly Format

Page 1 of 2
| Next |


Practical Object Versioning for Distributed Mobile Databases

Robert Wills
Severn Trent Systems, Alexander House
19 Fleming Way, Swindon SN1 2NG, United Kingdom


Extending the Enterprise Database to Mobile Devices
Distributing enterprise data to mobile devices for use by field personnel makes good business sense for a wide variety of applications. This is most easily achieved with a “thin client” architecture, where the data and application logic resides in the controlled environment of a central data center, and only the user interface executes on the mobile device. There are scenarios where the field worker must be able to run applications when no connection to the data center is available. To achieve this, a subset of the enterprise database and application code must be deployed to a “thick client” device and executed locally.

When a field worker updates the database on their disconnected mobile device, the changes cannot be immediately validated against data elsewhere in the enterprise. There needs to be some way of reconciling the overlapping edits and other data inconsistencies that can occur. If the data volumes are large, this reconciliation process needs to be automated wherever possible.

The discipline of Software Configuration Management (SCM) provides a solid theoretical framework based around the concept of “versioned objects”, which can be usefully adapted to solve some of the problems inherent in building a distributed enterprise-wide database.

Software configuration management (SCM) concepts

“Object”

Objects are collections of data which are treated as an atomic unit for the purposes of versioning. For a SCM system, objects are typically text files containing program source code. Any non-trivial software project will involve a large number of source files, which are usually organized into folders. Figure 1a shows a UML class model which describes a hierarchy of folders containing files - the essential concept behind most computer storage at the operating system level.





More complex associations and dependencies between these files are implied by the source code they contain, although it requires an appropriate compiler to figure them out. Some software development environments with SCM features make these associations more explicit by storing and versioning fragments of source code at a more granular level. Figure 1b is a simplistic model of part of the Java language, illustrating some typical fragments.

Formal programming language definitions and the full UML metamodel allow this process of decomposition to be taken much further, modeling the building blocks of individual executable statements and expressions. This level of analysis adds little value to a SCM process, and current SCM tools rarely have an object model more complex than figure 1a above.

The principles of object versioning can be usefully applied outside the SCM context to any model, no matter how simple or complex. The objects being versioned can be traditional RDBMS table rows just as easily as program source files.

Object identity is an important issue for any system that supports versioning. The RDBMS approach is to identify an object by some of its attributes, known as its “primary key”. An OODBMS identifies objects with some sort of extra reference or “handle”, which might be a memory address or a number which acts as a unique object identifier (“OID”). For a distributed, version-aware application there are advantages to the object-oriented approach, since it allows two objects created on different remote systems but with the same primary key to exist in the same distributed database. This situation implies some form of conflict which will eventually need resolving, but the system must have somewhere to store the two objects in the meantime.

“Version”
When an object is changed, an SCM system does not throw away the old state of the data. Each state is called a “version”, and is given some form of unique version identifier (“VID”). More importantly, the SCM keeps a record of what previous version the new data is based upon, creating a directed graph of the versions of a particular object. Other housekeeping information is typically stored with each version, such as the name of the user making the change, a timestamp, and comments about why the change was made.


Figure 2

An important property of object versions is their stability. Once created, the contents of an object version should not change for the entire lifetime of the SCM project. Change always implies creating a new version.

Simplistic SCM systems can only store multiple versions of file objects. More practical SCM systems allow both folders and files to be versioned, and this principle can be extended to a full application data model, where some or all of the different object types are versioned.

The existence of versions gives rise to two different types of object reference. If the example above is part of the version history of a file called “collections.cpp”, then a version-specific reference which specifies “collections.cpp, version 2.8”, will always resolve to the same piece of data. A “bill of materials” describing the contents of a software release is one context where the stability of a versionspecific reference is very useful. If a second source file contains the statement #include “collections.cpp” then it is making a generic reference to the collections.cpp file, relying on some other mechanism, such as a “view”, to select a specific version.

“Baseline” or “Stripe”
In SCM terms, a “baseline” or “stripe” is something that describes a snapshot of the system, and can be used to reconstruct that snapshot later. The snapshot might represent the source files used to build a particular beta release, for example. Physically, the baseline consists of a list of version-specific object references or a rule which can be used to derive the list.

Baselines are typically used to record stable, consistent states that the objects within the system achieve as a group. In the SCM context this might mean that the source files build successfully, or that the resulting executable code does something useful. In the version-aware application context it could mean that all referential integrity and other validation constraints are fulfilled. Baselines can be used as part of the implementation of a “long transaction”, where they act in a similar way to a SQL

Savepoint.

“Check-in”, “Check-out”

A developer using an SCM system is often required to “check-out” a file before making any changes to it. This applies a lock to that particular version of the file which prevents any other developers from starting work on it, until it has been “checked-in” again. Most SCM systems can also be configured to only lock object versions during the brief period when a developer’s changes are being merged back to the code repository. These two approaches correspond closely to the “pessimistic” and “optimistic” locking strategies available in most RDBMS products.

In a distributed database with intermittent connections, a pessimistic lock or “check-out” may take a long time to complete. It requires an exchange of messages with the central datastore, or other lock manager which is not possible while the device is disconnected. A strategy of optimistic locking, coupled with automated tools for merging changes when conflicts are detected later, is effective for implementing most (but not all) application functionality requirements.

“Branch” or “Activity”
When looking at the directed graph of versions of one particular object, there may be occasions when two or more versions exist at the same point in time, and are derived from the same predecessor, or “parent”. Some SCM systems use a form of Dewey-decimal numbering to store an object’s version graph within the names assigned to versions.

In the SCM context, branches occur in the version graphs of objects when different development activities are running in parallel. A typical example is release bug-fixing, where one group of developers make small, carefully-controlled fixes to a code baseline which has been released to customers, while others are working on major new functionality for the next release. Most SCM tools allow branches in individual object version graphs to be associated with a label representing the global context, or “activity” for which they were created.


Figure 3

While the version label “1.3.1.0” is specific to the object “widget.java”, the branch name “V2.2_BETA1_BUGFIX” is global and may involve changes to many objects.

In the context of a distributed mobile database application, global branches can be used to keep changes applied on one mobile device separate from those applied on another. A branch might have the title “changes made on device 21”, for example.

Page 1 of 2
| Next |

Applications | Technology | Policy | History | News | Tenders | Events | Interviews | Career | Companies | Country Pages | Books | Publications | Education | Glossary | Tutorials | Downloads | Site Map | Subscribe | GIS@development Magazine | Updates | Guest Book