|
|
|
Advanced Technical Topics
|
Database Integration: Criteria and Techniques
Integration Criteria
The integration criteria that we have identified are data currency, performance, cost, encapsulation, data
synthesis, data access, administration, side-effects, data integrity, and scalabi Iity.
- The data currency criterion focuses on how “fresh” the data from a source database is in a target
database. Depending on the integration technique, there can be time delays in making the data in one
database (the source) available to users and applications of another database (the target). The duration
of the time delay can range from as short as minutes or hours to as long as weeks or months. From
some users and applications, the time delay is a critical issue.
- The performance criterion focuses on the execution speed of the integration technique. It examines
how quickly queries, updates and other data operations perform when those operations affect data in
multiple databases.
- The cost criterion includes not only the cost of the integration products but also the cost of
implementation. At a minimum, every integration product will require some configuration and
database administration effort. Depending on the integration technique and the utility’s existing
computer environment, there can also be hardware costs, networking costs, and software development
costs for new software or modifying existing software.
- The encapsulation criterion considers how well the integration solution hides the physical location of
the data from the users and the applications. If the data is poorly encapsulated, then more of a burden
is placed on the users and the applications. Poor encapsulation makes it more difficult for the users to
use the database in an ad-hoc fashion. To browse the data, produce ad-hoc reports, perform on-line
analysis, and so on, the user has to know the location of the data. In a simi tar fashion, poor
encapsulation creates dependencies in the applications. The application has to know the location of
the data to work with it and has to know the business rules and constraints that apply to worki ng with
that data. In the long run, these dependencies increase the application’s software
development/maintenance costs and make database administration more difficult.
- The data synthesis criterion examines how userswork with the data from the multiple databases. The
term data synthesis is used for those operations that “combine” data from multiple databases.
“Combining” data can include statistical operations, calculations, joins and comparisons. This criterion
is used to identify where those operations occur. For example, the data synthesis can occur in a
database, in an application, or in middleware.
- The data access criterion focuses on what kind of access the integration solution implements for the
users and applications. Data access includes create, read, update and delete access as wel I as whether
the access is hi-directional: users and applications of one database may have access to a second
database, but the integration product may not provide usersand applications on the second database
access to the first database.
- The administration criterion includes the effect the integration solution has on database administration.
As mentioned above, there is an inverse relationship between encapsulation and administration: the
better the encapsulation, the easier the administration. In addition to the impact of applications on
database administration, this criterion also looks at the impact an integration technique can have on
other database administration concerns such as backup and recovery or security.
- The “side-effects” criterion is used to classify the integration techniques’ impact on other applications
and other parts of the util ity’s computer environment. Some techniques require substantial changes in
existing applications or in the computer environment.
- The data integrity criterion examines how well the integration technique manages transactions that
apply to multiple databases. Different integration techniques wil I have different recovery, commit and
rol Iback capabilities.
- The scalability criterion evaluates how well an integration technique performs as the number of
databases being integrated increases.
Integration Techniques
Integration by Batch Copy
The first integration technique is shown in Figure 1. With this approach, data is extracted from the external
system and loaded into the AM/FM/GIS. This approach can vary in the sophistication of its
implementation. In less sophisticated implementations, the data is manually extracted, transferred by tape
or other media and manual Iy loaded. In more sophisticated implementations, this is the basis for data
warehousing and the data is automatical Iy extracted, transferred and loaded on a regular and frequent
basis.
This technique has several advantages. The first is cost. This technique tends to be the least expensive
form of integration that can be implemented. Often special integration products are not required and the
DBMS tools for exporting and importing data can be used. Minimal changes to the databases are required
and most of the implementation work is configuri ng the DBMS tools or writi ng the sotlware to extract and
load the data. Frequently, the network infrastructure to digital Iy transfer the data is already available, but
can be implemented for relatively low cost if it is not.
Another advantage is that, since the data is copied to the relational database management system (RDBMS)
component of the AM/FM/GIS system, many of the criteria do not apply or are satisfied by this technique.
The criteria that are satisfied by this technique are performance, encapsulation, data synthesis,
administration and side-effects. Users and applications get the same performance and encapsulation for the
copied data as they do for their AM/FM/GIS data. Data synthesis is performed by the RDBMS and is no
different from the synthesis performed on the AM/FM/GIS data. Similarly, since the data is copied from the
external system, this integration technique does not affect the admi nitration of the external system or affect
the external system’s applications. In the AM/FM/GIS system, the copied data is treated Iike al I of the other
data in the RDBMS and does not require changes in applications or database administration.

Figure 1: Integration by batch copy
However, there are several disadvantages. Typically, utilities use the simpler forms of this technique. As a
result, one of the disadvantages is that data currency becomes a key issue. Even when the batch copy is
fully automated, it is usually done on an overnight or longer basis. The more that critical or expensive
business decisions are based on stale data, the more critical this issue becomes.
The other disadvantages have to do with data access, data integrity and scalability. Since the AM/FM/GIS is
working with a snapshot copy, data access for users and applications must be read-only to guarantee data
integrity. While theoretical Iy possible, it is extremel y difficult to implement a merge capabil ity that will
allow multiple users to make changes in parallel on the same data in multiple systems. Any kind of merge
functionality will have to be custom-built and posesan unacceptable risk to a utility’s data. Also, as the
figure shows, data access is one-way for users and applications. The AM/FM/GIS wil I have access (via the
snapshot copy) to the external system’s data, but the external system wil I not have access to the AM/FM/GIS
data. If the external system also needs access to the AM/FM/GIS data, another interface must be
implemented to copy the AM/FM/GIS data to the external system. This data access Iimitation creates a
scalability limitation when this is the only technique used. While full integration between two systems only
requires two copies, the number of copies jumps to six for three systems, 12 for four systems and so on.
|
|
|
|