GISdevelopment.net ---> GITA 2003 ---> System Architecture

Long transactions in an RDBMS

S. Agarwal, G. Arun, R. Chatterjee, B. Speckhard, R. Vasudevan
Oracle Corporation
One Oracle Drive
Nashua, NH 030360


Abstract
Engineering design systems based on RDBMSs (relational database management systems), such as geographical information systems (GIS), require support for long-duration transactions from the RDBMS. These applications require long-transaction (LT) management to be tightly integrated with database features common in RDBMSs, such as triggers, constraints, indexes, data model evolution (DDL), import/export and bulk loading. In this paper, we will describe a Workspace Management system, which addresses the above issues.

With the blending of LT capabilities with traditional RDBMS functions, it is now easier to embrace and extend spatial data and operations across the enterprise. A complete system for supporting long transactions should include support for all the well understood long transaction features such as versioning of database rows, version trees of arbitrary depth and fan-out, hierarchical multi-user long transactions, long-transaction locks, conflict detection and resolution, privilege management, and ability for partial/complete refresh, publish and rollback. In addition, it should extend support for the aforementioned RDBMS features to versioned data.

1. Introduction
The need to manage complex long-duration transactions (LT) is common in today’s GIS, web publishing, e-commerce, and CAD applications [2,3,4,5,8]. Conventional database systems are designed to handle short-duration transactions such as those in financial applications. The short transaction framework is inadequate for handling applications that must support long-duration transactions. Consider the following example:

A group of engineers designing modifications to a road network will need to manipulate digital representations of a number of parcels, telegraph poles, and other objects (commonly represented as spatial geometries), which may take hours, days or even weeks. While the modifications to the road network are in progress, the database changes being done cannot be made visible to other users of the production database, because the long transaction is incomplete. Conventional database systems achieve this kind of isolation for short transactions using locks. But this pessimistic mode of locking becomes impractical for long-duration transactions, as it limits concurrency to a great degree. Thus, the group of engineers, in some sense, need to have a private “version” (copy) of the data for the duration of the transaction, which can be made available to other users when they have finished their work. In addition, the group of engineers need to be able to see each other’s changes made as part of the same long transaction as the changes are happening. Regular short transactions do not allow for this kind of collaborative data manipulation [3].

Repository [6,7] and Source Code control systems have traditionally provided comprehensive support for versioning, which can be used to address the problem of Long Transactions. Typically, these frameworks provide a checkout/checkin model with different levels of functionality to merge versions, refresh versions, resolve conflicts, etc.

Long Transaction support in an RDBMS is different from what the above systems provide in the following manner:
  1. RDBMSs add significant value to relational data by providing comprehensive support for features like constraints, indexes, import/export, bulk loading, replication, triggers, data model evolution, virtual private databases etc. Accordingly, Long Transaction management systems in an RDBMS have to support these features in the presence of versioned relational data.
  2. The data being accessed in an LT is relational, on a much larger scale, and adhoc. RDBMSs efficiently support adhoc querying and modifications wherein apriori knowledge of the data being accessed is not very common. Hence, systems that provide support for LTs in an RDBMS have to accommodate this need by providing a copy-on-write approach where the data need not be checked out prior to modification.
Thus, existing checkout/checkin based multi-version systems address a problem different from that of support for LTs in an RDBMS. In this paper, we will talk about Oracle Workspace Manager (WM), a framework to support Long Transactions in an RDBMS, which is tightly integrated with the database and extends support for all the features that the database natively provides

The rest of the paper is organized as follows. Section 2 describes the overall architecture of the system. Section 3 explains the long transaction functionality of the system. Section 4 summarizes the support provided by the system for RDBMS features on versioned relational data. Finally, Section 5 presents conclusions and future work.

2. Architecture

WM provides a long-transaction framework built on a multi-version system. It uses a series of short transactions and multiple versions of data to complete a long transaction. WM has a rich set of features including versioning of database rows using copy-on-write scheme, version trees of arbitrary depth and fan-out, hierarchical multi-user long transactions, long-transaction locks, conflict detection and resolution, privilege management, and ability for partial/complete refresh, publish and rollback (long transaction operations). WM also supports RDBMS features such as referential and unique constraints, DML triggers, import/export of versioned data, data model evolution, replication, virtual private databases, and support for direct loading. Figure 1 shows a high-level view of WM architecture.


Figure 1: WM Architecture

A table T is enabled to participate in long transactions by transforming it into a versioned relational data source (VRDS). This process replaces T with a view over a set of tables that contain data from T and some additional versioning information. The view filters data based on the current point in the version hierarchy. This version view is formally defined as follows:

Version View (S, V) for a VRDS S from a version V is the set of the most recent rows that are in the ancestry of the version V; that is, it is the set of rows from the VRDS S as seen from version V. This is similar to other long transaction systems that implement versioning by augmenting the primary key with a version identifier.

Once a collection of tables in a database has been version-enabled, users can start a Long Transaction by creating a workspace. A workspace is a logical work area for a set of versions. WM contains a single system-wide version tree; the version tree evolves by explicit version creation operations. Users can set their working workspace and execute regular SQL statements against a VRDS without any syntax changes. Changes to versioned objects are made in the latest version of the users’ working workspace. Figure 2 shows a workspace hierarchy with the underlying version tree:


Figure 2: Workspace Hierarchy

3. Versioning and long transaction features

3.1 Workspace Operations
A workspace or a collection of workspaces can form a long transaction (say, a collaborative design project). Each workspace can have multiple user sessions and multiple versions of data from many version-enabled tables. All the work done by the individual user sessions in a workspace is done by conventional short transactions. A workspace typically has one parent workspace (except in special cases where multiple parents are allowed for a given workspace as described later in the Section 3.6) and can have multiple child workspaces. The workspace hierarchy can be many levels deep. WM provides a set of operations that can be performed on these workspaces. Some of the important workspace operations are:

Refresh Workspace
The Refresh Workspace operation on a workspace W makes all the changes made in the parent workspace of W, since the creation of W, visible in W. WM achieves this refresh operation by altering the version hierarchy information rather than copying data from the parent workspace. Figure 3 reproduces a portion of Figure 2 to explain the refresh of WORKSPACE B.


Figure 3 Refresh Workspace

In Figure 3, WORKSPACE B is refreshed from WORKSPACE A. The version hierarchy between the two workspaces is altered so that version 9 is now derived from version 8 (instead of version 6). This makes all the changes made in versions 7 and 8 of WORKSPACE A visible in WORKSPACE B. Either all the changes or a partial subset of changes from the parent workspace can be refreshed into the child workspace. Data conflicts have to be resolved before a refresh workspace operation can proceed. Data conflicts arise between workspaces when the same data is changed in both the parent and child workspaces. Data conflicts can be avoided with the use of version locks (explained in the next section).

Post Workspace
This operation copies all the changes made in a child workspace into its parent workspace. In Figure 2, if a post workspace operation is executed on WORKSPACE B, all the data in WORKSPACE B that has changed since the previous post operation is copied into version 8 of WORKSPACE A. A post workspace operation allows a parent workspace to stay abreast of changes made in a child workspace. Either all the changes or a partial subset of changes from the child workspace can be posted to the parent workspace. As with the refresh workspace operation, data conflicts between the two workspaces need to be resolved before a post operation can proceed.

Commit Workspace
This operation executes a post workspace operation and then removes the workspace.

Remove Workspace
This operation deletes all the data in a workspace and then removes the workspace. It also removes the versions associated with the workspace from the version tree.

Compress Workspace
This operation is useful for garbage collection; it compresses the version hierarchy and the data in version-enabled tables. It can be executed on a range of versions in a workspace, the entire workspace or on a tree of workspaces. When a compress workspace operation is executed, all redundant versions (versions that have been created because of savepoints) are removed and the version tree is collapsed. In addition, all non-latest copies of database records in the compression range are deleted and the remaining records in the range are moved to the minimum version of the range. Figure 4 describes a compress workspace tree operation on WORKSPACE A. The redundant versions in WORKSPACE A and WORKSPACE B are removed and the version hierarchy is collapsed.


Figure 4 Compress Workspace

3.2 Conflicts
Data conflicts arise for a workspace when the same data is changed in both the parent and child workspaces. WM provides a comprehensive framework to detect and resolve conflicts. Conflicts have to be resolved before a workspace can be refreshed or merged. It is desirable to periodically examine and resolve potential conflicts during the course of a long duration project as opposed to resolving all conflicts during a merge or refresh operation. WM provides dynamic conflict views that can be queried at any point to show all conflicting records. Some long transaction models use a fat client [9] that caches versioned records as opposed to all versioned records being stored centrally in the database server. In the former model, the problem of dynamically detecting potential conflicts becomes harder as versioned records are distributed across multiple client machines. So, such models resort to other expensive and imprecise methods to support this functionality.

3.3 Lock Management
In addition to the locking support provided by the underlying database system, WM provides record level version locks, which can be used to enforce a pessimistic mode of operation wherein long transactions modifying the same record will get serialized. These locks persist for the duration of a workspace (long transaction). The primary use of these locks is to prevent data conflicts between parent and child workspaces.

3.4 Privilege Management
WM provides support for a set of privileges that help in the administration and security of the system. In conventional databases, no privileges are provided for short transaction related operations such as begin, commit, and rollback, since short transactions are single-user operations. In a collaborative, long transaction framework, privileges need to be associated with the starting, committing and rolling back of transactions. All the workspace operations such as refresh, post, commit and rollback have explicit privileges associated with them. User sessions need to be granted these privileges before they can execute the operations. Privileges can be granted on a named workspace basis or on a database system-wide basis. In addition, each of these privileges can be granted with a grant option, which enables the grantee to grant that privilege to other users.

3.5 Multiple views of data
Applications in the GIS domain have a requirement to see different views of data. Some of these include:
  • the “as built” view which shows the state of data that is currently built without interference from versions that have not yet been merged.
  • the “in-progress design” view which shows data that is in a non-root version and has not yet been merged into the root version.
  • the “future as-built” view which shows a view of data that represents the expected as-built view when certain specified non-root versions get merged into the root version. This view allows users to perform analysis to measure the impact of proposed future changes.
  • the “mutiple versions” view which shows data changed in multiple specified versions tagged with the version identifier. This view allows users to see other versions that may overlap with changes they are making in their current version.
WM provides support for these views of data by creating appropriate database views for each versionenabled table.

3.6 Multi-Parent workspaces (jobs on jobs)
A requirement that is common in the GIS domain is to create a design that is dependent on multiple designs that have not yet been completed (and hence have not yet been merged into the root workspace). WM provides a notion of multi-parent workspaces that can be used to model this ‘jobs dependent on multiple jobs’ requirement. A workspace can be created as a child of multiple parent workspaces thereby creating a graph of workspaces as opposed to a tree. The parent workspaces are required to have versioned a mutually exclusive set of records from version-enabled tables.

3.7 History
When a table is version enabled, it can be done so with the history option. If the history option is chosen, WM adds additional columns to the table which track the create time and the retire time of each version of a database record. The retire time of a version of a database record is the time when this version of the record becomes hidden by a later version of the same record. WM also creates views, which help in running temporal queries and tracking lineage of database records across different versions. In addition WM implements a gotoDate procedure which given an instant takes the session to the state of the database at that instant and as seen from the current workspace. This is achieved by dynamically rewriting the data-filtering views mentioned earlier.

4. RDBMS integration issues
Long Transaction support is provided in commercial systems by a number of vendors. Many of these systems use a proprietary data store, with native support for versioning and long transaction functionality, to store spatial data. While this model has suggested benefits [9], it lacks many of the benefits provided by a framework that stores spatial data in a commercial RDBMS and extends it to provide long transaction support. Some of the advantages of the latter approach are:
  • Commercial RDBMSs provide a robust and time-tested framework to store and retrieve data. Spatial data is simply another data type (albeit more complex than other scalar data types) that should be stored and manipulated in an RDBMS.
  • RDBMSs provide a rich set of features such as high availability, scalability, security, integrity constraints, triggers, utilities (such as import/export, bulk loading, etc), replication, etc. Storing spatial data in an RDBMS extends these capabilities to such data.
  • Businesses with a need to store and manage spatial data want all their data stored in one repository. This allows for consolidated system management, consolidation of IT personnel, and facilitates easy integration of spatial data with other business data. A quote from IDC [10] reinforces this trend - “most businesses want to handle location information as simply and cheaply as possible, just like they handle any other data.....".
Once spatial data is stored in an RDBMS, the RDBMS has to be extended to provide support for long transactions. In addition to supporting the commonly required versioning and long transaction functionality, a complete system has to extend support for the above-mentioned RDBMS features to versioned data. In this section, we discuss issues involved in supporting some of these features/utilities for versioned data and how WM solves these problems.

Unique Constraints
RDBMSs define a unique constraint for a table as follows:

A Unique key integrity constraint requires that every value in a column or set of columns (key) be unique; that is, no two rows of the table have duplicate values in the specified key column(s). In order to extend the model for unique key constraints in the presence of versioning, we define the following term:

A set of columns of a VRDS S is said to be Version Unique with respect to a version V if no two rows of the Version View (S, V) have duplicate values for the columns. Now, we define a Unique Key Constraint on a column or a set of columns in a VRDS in a multiversioned RDBMS as follows:

A Unique key integrity constraint on a set of columns in a VRDS S requires that the key columns be Version Unique with respect to every version in the versioning system; that is, no two rows of the VRDS S have duplicate values in the specified key column(s) as seen from any given version in the system.

The above definition of unique constraints is enforced by WM during DMLs and Long Transaction operations.

Referential Integrity Constraints
When the data in a table (called the child table) depends upon the data in another table (called the parent table, which may be same as the child table), a referential integrity constraint is said to exist between the tables. The child table in turn could be a parent table for other referential constraints, giving rise to multilevel referential constraints.

If one or both of the tables involved in a referential integrity constraint are being versioned by WM, the constraint needs to be enforced between correct versions of the rows. This enforcement needs to be done both for DMLs and long transaction operations such as publish, refresh and rollback. For example, suppose both the tables are being versioned and the cascade option is chosen. Since there are multiple versions of rows in both the child and parent tables, when a row is deleted from the parent table, it will be incorrect to delete all matching rows from the child table because a child table row may depend on a different version of the parent table row. As a result, the existing algorithm for referential integrity enforcement explained earlier needs to be extended for long transactions. WM uses the following algorithm for enforcing referential integrity constraints. There are three cases to consider:

1. The child table is versioned and the parent table is not versioned:
  1. When a row is being deleted from the parent table in a workspace S, for cascade option WM deletes all dependent rows in all the versions in the child table; while for restrict option, WM disallows the delete from the parent table if there exists a dependent row in any version of the child table.
  2. When a row is updated in the parent table in a workspace S, if there exists a dependent row in any version of the child table, WM disallows the update on the parent table.
  3. When a child table row is inserted or updated in a workspace S, WM allows the operation only if a matching row exists in the parent table.
This ensures that for every version in the version tree, the referential integrity constraint holds between the child table data seen from the version and the parent table data.

2. Both tables are versioned:
  1. When a parent table row is being deleted in a workspace S, for cascade option, WM deletes all dependent child table rows as seen from the workspace S; while for restrict option, WM disallows the delete from the parent table if a dependent child table row is seen from the workspace S.
  2. When a parent table row is updated in a workspace S, WM allows the operation only if no dependent child table rows are seen from the workspace S.
  3. When a child table row is inserted or updated in the a workspace S, WM allows the operation only if a matching parent table row is seen from the workspace S.
This ensures that for every version of the version hierarchy, the referential integrity constraint holds between parent and child table data as seen from the version.

3. Parent table is versioned and the child table is not versioned: A referential integrity constraint is a many (in the child table) to one (in the parent table) relationship. However, in this case, for a row in the child table there could be multiple matching rows in different versions of the parent table. Thus the many-toone relationship is not clear any more. Hence, WM disallows this case and stops this case from occurring.

Triggers
DML triggers are widely used in databases for auditing, data integrity checks, etc. When a table is transformed into a VRDS to participate in a long transaction, each insert/update/delete on the VRDS translates into a number of DML operations on the tables underlying the VRDS. DML triggers on the VRDS need to be associated with these logical DML operations instead of the physical DML operations on the underlying tables. These DML triggers may also need to be executed for data modified through long-transaction operations such as publish and refresh.

WM transforms the DML triggers to use column values of the VRDS row instead of column values of the underlying tables. These transformed triggers are invoked from appropriate points in the execution of the DML operation on the VRDS. WM passes context (operation type, invoking user, etc) about the operation to the application, which can be used for conditional execution inside the triggers. For example, operation type (application generated DML, refresh, publish etc.) can be used to execute certain triggers only during long transaction operations and not during DML operations directly issued by the application.

Data Model Evolution
Since a VRDS is implemented as a view over a set of tables, data definition language operations (DDL) such as adding/dropping columns, triggers, constraints and indexes on the VRDS are difficult, as these operations (1) cannot be directly done on a view and (2) need to be transformed to account for versioning.

Most Long Transaction systems either do not support DDL operations, which means that the data model is frozen forever and cannot evolve over time, or provide proprietary procedural interfaces for performing DDL operations. WM supports DDL operations through SQL on a VRDS rather than through proprietary procedural interfaces. It also meets the important requirement that support for DDL should not necessitate the removal of data that is stored in versions other than the root version. The data model evolution algorithm consists of four main steps: materialization, performing DDL operations, transformation and application.

The first step is to automatically materialize the structure of the VRDS as an empty table called skeleton table. The skeleton table has the same columns, constraints, triggers and indexes as the VRDS. Now DDL operations can be performed on the skeleton table using SQL. Materialization facilitates the expression of DDL operations in SQL rather than through procedural interfaces, thus unleashing the full power of SQL. Materialization is done by WM when the user starts a DDL session on a VRDS.

In the second step, the user performs the desired DDL operations for the VRDS on the skeleton table using regular SQL.

The transformation and application steps are done by the system when the user ends the DDL session on the VRDS. The system computes the sequence of DDL operations done on the skeleton table by consulting the data dictionary, transforms each step to account for versioning, and then applies the transformed sequence to the VRDS. The transformed sequence of DDL operations is stored persistently.

This enables recovery if any operation fails while being applied to the VRDS. During the first three steps, the VRDS can continue to participate in all data manipulation operations such as publish, refresh, rollback, insert, update, and delete. In the final application step, all long transaction operations not involving the VRDS are allowed.

Thus, the above algorithm can handle DDL through SQL without requiring any special purpose SQL parser, and without having to discard data that is stored in versions other than the root version; hence, it can be easily used in existing relational database management systems.

Import/export for Backup/Recovery.
Most commercial RDBMSs provide support for exporting data from the database into flat files and vice versa. This mechanism is used by many systems to take periodic backups, which can be used for recovery in future by importing an exported image into the same database. Typically, various modes of Import/Export are supported such as Full database, Schema level, and Table level. But, in a multi-version system, a VRDS contains version information that can only be interpreted in the context of the complete version tree stored in system tables. Hence, a VRDS exported without the associated version tree information cannot be imported back as the version tree might have changed since the VRDS was exported.

WM solves this problem by using an algorithm consisting of the following steps:
  1. During export, WM captures a snapshot of the relevant version meta-data tables and writes them to the export dump file.
  2. After the export, all long transaction operations, which affect any of the exported VRDSs, are captured in an operations log.
  3. During import, WM applies all the operations from the operations log that were performed since the table was exported. These operations are applied to the imported snapshot of the VRDS using the current and the imported snapshot of the version tree meta-data, thereby bringing the exported snapshot of the VRDS in sync with the current state of the version tree.
Bulk/Direct loading
Most commercial RDBMSs provide support for bulk loading of data from files into tables. WM extends this functionality to allow data to be bulk loaded into a VRDS in any workspace. Frequently, bulk loading is done using direct path for performance (i.e., SQL processing is skipped). This means that versioning meta-data cannot be generated for data directly loaded into a VRDS. Usually, database systems allow columns to be populated with constant or null values during direct load. WM uses this mechanism to load data into a data table underlying the VRDS and tags the loaded data with a reserved version that is not exposed through the VRDS view. After the direct load, a post processing procedure is invoked. This procedure generates versioning meta-data for newly loaded data and synchronizes newly loaded data with existing versioned data. It also performs constraint checking on the newly loaded data using the algorithms mentioned above.

5. Conclusions
There are a wide variety of applications, such as GISs, that need support for long transactions in a commercial RDBMS. These applications require that all the features commonly available in an RDBMS, such as constraints, indexes, triggers, data model evolution, import/export, bulk loading, etc., be fully supported in a long transaction framework.

WM provides a complete system for supporting long transactions in an RDBMS. It includes support for all the well understood long transaction features such as versioning of database rows, version trees of arbitrary depth and fan-out, hierarchical multi-user long transactions, long-transaction locks, conflict detection and resolution, privilege management, and ability for partial/complete refresh, publish and rollback. In addition, it is tightly integrated with a commercial RDBMS and provides full support for all the common database features. In doing so, WM has defined new semantics for some features (such as referential integrity constraints and unique constraints) and has extended the existing semantics for others (such as triggers, import/export, and bulk loading) to accommodate multiple versions of relational data.

References
  • Oracle9i Server Concepts: Oracle Corporation, Part Number A88856-02, July 2001.
  • Marian H. Nodine and Stanley B. Zdonik. Cooperative Transaction Hierarchies: A Transaction Model to Support Design Applications. In Proceedings of 16th VLDB Conference, Brisbane, Australia, 1990.
  • M. Rusinkiewicz etal. Towards a Cooperative Transaction Model - The Cooperative Activity Model. In Proceedings of 21st VLDB Conference, Zurich, Switzerland, 1995.
  • H. Korth, W. Kim, and F. Bancilhon. On long-duration CAD transactions. Information Systems, 13, 1987.
  • Hong-Tai Chou, Won Kim. A Unifying Framework for Version Control in a CAD Environment. In Proceedings of 12th VLDB, Kyoto, Japan, 1986.
  • Thomas Bergstraesser, Philip A. Bernstein, Shankar Pal, David Shutt. Versions and Workspaces in Microsoft Repository. In Proceedings of ACM SIGMOD International Conference on Management of Data, Pennsylvania, USA, 1999.
  • Philip A. Bernstein, Umeshwar Dayal. An Overview of Repository Technology. In Proceedings of 20th VLDB Conference, Santiago, Chile, 1994.
  • Randy H. Katz. Toward a Unified Framework for Version Modeling in Engineering Databases. In Proceedings of ACM Computing Surveys, Vol. 22, No. 4, December 1990.
  • Peter M. Batty. Version Management Revisited. In Proceedings of GITA Annual Conference, Florida, USA, 2002.
  • IDC. Spatial Information Management: Competitive Analaysis and Trends. October 2002, Volume 1.
© GISdevelopment.net. All rights reserved.