Logo GISdevelopment.net

GISdevelopment > Proceedings > GITA > 1997


GITA 2002 | GITA 2001 | GITA 2000 | GITA 1999 | GITA 1998 | GITA 1997
Sessions

Advanced Technical Topics

Building & Supporting Applications

Business Evolution & Platform Migration

Expanding the User Base -- Non-Traditional Applications

From the office to the Field

Fundamental & Economic Issues of AM/FM/GIS

Lessons Learned

Major Technology Trends and their Impacts

Project Planning, Implementation and Management

Re-Engineering and Integration Issues

Scada and Real-Time Systems

User Project Presentations

Best of the Rest

Invited Presentation


GITA 1997


Advanced Technical Topics


Database Integration: Criteria and Techniques


Integration Criteria
The integration criteria that we have identified are data currency, performance, cost, encapsulation, data synthesis, data access, administration, side-effects, data integrity, and scalabi Iity.
  • The data currency criterion focuses on how “fresh” the data from a source database is in a target database. Depending on the integration technique, there can be time delays in making the data in one database (the source) available to users and applications of another database (the target). The duration of the time delay can range from as short as minutes or hours to as long as weeks or months. From some users and applications, the time delay is a critical issue.
  • The performance criterion focuses on the execution speed of the integration technique. It examines how quickly queries, updates and other data operations perform when those operations affect data in multiple databases.
  • The cost criterion includes not only the cost of the integration products but also the cost of implementation. At a minimum, every integration product will require some configuration and database administration effort. Depending on the integration technique and the utility’s existing computer environment, there can also be hardware costs, networking costs, and software development costs for new software or modifying existing software.
  • The encapsulation criterion considers how well the integration solution hides the physical location of the data from the users and the applications. If the data is poorly encapsulated, then more of a burden is placed on the users and the applications. Poor encapsulation makes it more difficult for the users to use the database in an ad-hoc fashion. To browse the data, produce ad-hoc reports, perform on-line analysis, and so on, the user has to know the location of the data. In a simi tar fashion, poor encapsulation creates dependencies in the applications. The application has to know the location of the data to work with it and has to know the business rules and constraints that apply to worki ng with that data. In the long run, these dependencies increase the application’s software development/maintenance costs and make database administration more difficult.
  • The data synthesis criterion examines how userswork with the data from the multiple databases. The term data synthesis is used for those operations that “combine” data from multiple databases. “Combining” data can include statistical operations, calculations, joins and comparisons. This criterion is used to identify where those operations occur. For example, the data synthesis can occur in a database, in an application, or in middleware.
  • The data access criterion focuses on what kind of access the integration solution implements for the users and applications. Data access includes create, read, update and delete access as wel I as whether the access is hi-directional: users and applications of one database may have access to a second database, but the integration product may not provide usersand applications on the second database access to the first database.
  • The administration criterion includes the effect the integration solution has on database administration. As mentioned above, there is an inverse relationship between encapsulation and administration: the better the encapsulation, the easier the administration. In addition to the impact of applications on database administration, this criterion also looks at the impact an integration technique can have on other database administration concerns such as backup and recovery or security.
  • The “side-effects” criterion is used to classify the integration techniques’ impact on other applications and other parts of the util ity’s computer environment. Some techniques require substantial changes in existing applications or in the computer environment.
  • The data integrity criterion examines how well the integration technique manages transactions that apply to multiple databases. Different integration techniques wil I have different recovery, commit and rol Iback capabilities.
  • The scalability criterion evaluates how well an integration technique performs as the number of databases being integrated increases.
Integration Techniques

Integration by Batch Copy
The first integration technique is shown in Figure 1. With this approach, data is extracted from the external system and loaded into the AM/FM/GIS. This approach can vary in the sophistication of its implementation. In less sophisticated implementations, the data is manually extracted, transferred by tape or other media and manual Iy loaded. In more sophisticated implementations, this is the basis for data warehousing and the data is automatical Iy extracted, transferred and loaded on a regular and frequent basis.

This technique has several advantages. The first is cost. This technique tends to be the least expensive form of integration that can be implemented. Often special integration products are not required and the DBMS tools for exporting and importing data can be used. Minimal changes to the databases are required and most of the implementation work is configuri ng the DBMS tools or writi ng the sotlware to extract and load the data. Frequently, the network infrastructure to digital Iy transfer the data is already available, but can be implemented for relatively low cost if it is not. Another advantage is that, since the data is copied to the relational database management system (RDBMS) component of the AM/FM/GIS system, many of the criteria do not apply or are satisfied by this technique.

The criteria that are satisfied by this technique are performance, encapsulation, data synthesis, administration and side-effects. Users and applications get the same performance and encapsulation for the copied data as they do for their AM/FM/GIS data. Data synthesis is performed by the RDBMS and is no different from the synthesis performed on the AM/FM/GIS data. Similarly, since the data is copied from the external system, this integration technique does not affect the admi nitration of the external system or affect the external system’s applications. In the AM/FM/GIS system, the copied data is treated Iike al I of the other data in the RDBMS and does not require changes in applications or database administration.


Figure 1: Integration by batch copy

However, there are several disadvantages. Typically, utilities use the simpler forms of this technique. As a result, one of the disadvantages is that data currency becomes a key issue. Even when the batch copy is fully automated, it is usually done on an overnight or longer basis. The more that critical or expensive business decisions are based on stale data, the more critical this issue becomes. The other disadvantages have to do with data access, data integrity and scalability. Since the AM/FM/GIS is working with a snapshot copy, data access for users and applications must be read-only to guarantee data integrity. While theoretical Iy possible, it is extremel y difficult to implement a merge capabil ity that will allow multiple users to make changes in parallel on the same data in multiple systems. Any kind of merge functionality will have to be custom-built and posesan unacceptable risk to a utility’s data. Also, as the figure shows, data access is one-way for users and applications. The AM/FM/GIS wil I have access (via the snapshot copy) to the external system’s data, but the external system wil I not have access to the AM/FM/GIS data. If the external system also needs access to the AM/FM/GIS data, another interface must be implemented to copy the AM/FM/GIS data to the external system. This data access Iimitation creates a scalability limitation when this is the only technique used. While full integration between two systems only requires two copies, the number of copies jumps to six for three systems, 12 for four systems and so on.

Page 2 of 4
| Previous | Next |

Applications | Technology | Policy | History | News | Tenders | Events | Interviews | Career | Companies | Country Pages | Books | Publications | Education | Glossary | Tutorials | Downloads | Site Map | Subscribe | GIS@development Magazine | Updates | Guest Book