Database Integration: Criteria and Techniques
Tom Lonski Senior Technical Consultant, UGC Consulting, 6200 S. Syracuse Way, Suite 222,Englewood, CO 80111 (303) 773-6166,(303) 773-6618 (fax) Email: Tom. Lonski@ugc.com Abstract Database integration is a key challenge for AM/FM/GIS projects since almost all projects have existing legacy systemswith data needed by the AM/FM/GIS. A large and confusing array of combinations, approaches and products exist for accomplishing this integration. The paper wil I define six general techniques that encompass al I of these combinations, address various integration issuesand analyze these techniques using ten specific criteria that affect a project’s performance, capabilities and cost. Introduction Only a short time ago AM/FM/GIS systemswere almost solely used by the engineering or drafting organizations within a utility. However, with the re-regulation and increasing competition in the utility industry, uti Iities are under increasing pressure to gai n an advantage by operating more intelligent y as a business. As a result, many utilities are faced with a common challenge on their AM/FM/GIS projects. To gain that competitive edge, utilities must fully realize the benefits of their AM/FM/GIS project by making the faciIity and geographic information available throughout the company to both internal organizations, such as customer service, marketing, construction, operations, and system planning and, external organizations, such as subcontractors or customers, who haven’t used or had access to the AM/FM/GIS information in the past. Likewise, information from those other organizations is needed within the AM/FM/GIS. The effect of this goal to ful Iy distribute the AM/FM/GIS information is that utilities now frequently face the prospect of integrating their AM/FM/GIS system with other computer systems. In practical terms, “sharing information” from the AM/FM/GIS system with other internal or external organizations at a utility means integrating the database(s) in the AM/FM/GIS system with the database(s) in the computer systems used by those other organizations. A functional Iisting of some of the computer systems at a uti Iity that may be integrated with an AM/FM/GIS includes: Customer Information System, Materials Management System, Planned Maintenance System, Computer-Aided Dispatch, One Call, Human Resources Information System, Outage Management System, Work Management System, Property Accounting, Supervisory Control and Data Acquisition, Engineering Analysis/System Planning, Accounts Payable, Non Utility Billing Internet or Intranet Web Server, and Land Management Systems. Note that this is a functional listing and that the various systems above may be combined or divided as actual Iy implemented at a particular utility. The Integration Jungle As utilities tackle these integration requirements, our experience has been that they often find themselves in a quandary over how to identify the best integration solution and implement it. One reason the integration problem can seem daunting is that many factors conspire to compl icate the integration problem. Some of these complicating factors are:
However, our experience has been that this bottom-up approach does not usually work well. Typically numerous products are avai Iable to integrate databases and this approach can turn into a prolonged research effort unless arbitrary Iimits are placed on what vendors and products are considered. Another drawback to this approach is that the integration products frequently use different database standards and different network protocols. By focusing on the integration products, it is difficult to determine what database standards and what network protocols will provide a coherent environment for integrating the util ity’s computer systems. Whi Ie a solution designed by a bottom-up approach may initial Iy work, there usually isn’t a high degree of confidence that a solution designed this way is a good one, i.e. that it meets al I of the functional needs, that it minimizes the cost, that the integration productltechnology can be reused and that the integration productitechnolo~ is flexible/general/expandable enough to support future requirements. These kinds of difficulties are common whenever a bottom-up approach is used for a problem in computer science. Instead, the conventional wisdom in computer science advocates the use of a top-down approach. In the course of wrestling with integration problems on a number of projects, we have defined ten general integration criteria to help focus on the functionality required in the database integration and six general integration techniques that can be used to classify the possible integration solutions. Using these criteria and techniques provides a top-down approach to the integration problems. By focusing on the integration functionality, planning and designing a solution is simplified. The integration criteria can be used to identify the appropriate integration technique as wel I as evaluate specific integration products. Once an integration technique is chosen, then the number of products that need to be examined in the search for candidate solutions is reduced. Another way the integration problem is simpl ified is, once an integration solution is identified, the network and communication part of the problem is typically simplified because the products used for the integration often specify the required protocols and networking. Integration Criteria The integration criteria that we have identified are data currency, performance, cost, encapsulation, data synthesis, data access, administration, side-effects, data integrity, and scalabi Iity.
Integration by Batch Copy The first integration technique is shown in Figure 1. With this approach, data is extracted from the external system and loaded into the AM/FM/GIS. This approach can vary in the sophistication of its implementation. In less sophisticated implementations, the data is manually extracted, transferred by tape or other media and manual Iy loaded. In more sophisticated implementations, this is the basis for data warehousing and the data is automatical Iy extracted, transferred and loaded on a regular and frequent basis. This technique has several advantages. The first is cost. This technique tends to be the least expensive form of integration that can be implemented. Often special integration products are not required and the DBMS tools for exporting and importing data can be used. Minimal changes to the databases are required and most of the implementation work is configuri ng the DBMS tools or writi ng the sotlware to extract and load the data. Frequently, the network infrastructure to digital Iy transfer the data is already available, but can be implemented for relatively low cost if it is not. Another advantage is that, since the data is copied to the relational database management system (RDBMS) component of the AM/FM/GIS system, many of the criteria do not apply or are satisfied by this technique. The criteria that are satisfied by this technique are performance, encapsulation, data synthesis, administration and side-effects. Users and applications get the same performance and encapsulation for the copied data as they do for their AM/FM/GIS data. Data synthesis is performed by the RDBMS and is no different from the synthesis performed on the AM/FM/GIS data. Similarly, since the data is copied from the external system, this integration technique does not affect the admi nitration of the external system or affect the external system’s applications. In the AM/FM/GIS system, the copied data is treated Iike al I of the other data in the RDBMS and does not require changes in applications or database administration. ![]() Figure 1: Integration by batch copy However, there are several disadvantages. Typically, utilities use the simpler forms of this technique. As a result, one of the disadvantages is that data currency becomes a key issue. Even when the batch copy is fully automated, it is usually done on an overnight or longer basis. The more that critical or expensive business decisions are based on stale data, the more critical this issue becomes. The other disadvantages have to do with data access, data integrity and scalability. Since the AM/FM/GIS is working with a snapshot copy, data access for users and applications must be read-only to guarantee data integrity. While theoretical Iy possible, it is extremel y difficult to implement a merge capabil ity that will allow multiple users to make changes in parallel on the same data in multiple systems. Any kind of merge functionality will have to be custom-built and posesan unacceptable risk to a utility’s data. Also, as the figure shows, data access is one-way for users and applications. The AM/FM/GIS wil I have access (via the snapshot copy) to the external system’s data, but the external system wil I not have access to the AM/FM/GIS data. If the external system also needs access to the AM/FM/GIS data, another interface must be implemented to copy the AM/FM/GIS data to the external system. This data access Iimitation creates a scalability limitation when this is the only technique used. While full integration between two systems only requires two copies, the number of copies jumps to six for three systems, 12 for four systems and so on. Application Level Integration The second technique performs the integration at the application level. This is shown in Figure 2. When the applications have the capability to directly communicate with multiple databases, they can be modified to retrieve information directly from the other systems. This technique rates well against the data currency, administration and side-effects criteria. Since the applications access the data directly in their “native” databases, the data is always current. The integrated applications should behave like any other applications so typically there are no changes in administration. Since this technique requires little or no changes to the database, the side-effects are minimal and existing users and applications of the external systemsare not affected by the applications integrated with multiple databases. The disadvantages to this technique include the encapsulation, data synthesis, data integrity and scalabiltiy criteria. This technique has very poor encapsulation and every application wi II have embedded dependencies on the location of data. Since the integration is at the application level, any data synthesis has to be implemented within the application. Similarly, if the integrated application is updating multiple databases, the application has the added complexity of managing transactions that apply across databases. Finally, this approach does not scale well. It works well for a few databases, but quickly becomes cumbersome and expensive as the number of applications and databases increases. ![]() Figure 2: Integration at the application level The costs for this technique are back-loaded unlike the other techniques where the costs are front-loaded. The initial implementation costs are low because many AM/FM/GIS products and database client application tools such as PowerBuilder, Delphi and SQLWindows can simultaneous y communicate with multiple databases. The initial implementation cost is usually somewhat higher than the previous technique because extra work may need to be done in the applications for them to access the external systems. Another increased cost is often in the network and communications. Since the applications are communicate ng directly with the other databases, more network infrastructure is required for this technique than for the file transfer level of communication needed in the previous technique. However, while the implementation costs are second-lowest of the techniques, the costs for this technique increase over time because the problems in encapsulation, data synthesis and data integrity all add complexity to the applications which significantly increases their development and maintenance costs. In the long run, this can be the most expensive technique to implement. Data access and performance are rated as average for this technique. While the integrated applications wi II have create, read, update, and delete access to the data, access to data in multiple databases is limited to only those integrated applications. Performance is also rated as average and is primari Iy dependent on how the applications work with the multiple databases. Performance can be quite fast for simple operations Ii ke queries, but because the application has to handle al I the data synthesis, performance can be quite slow for functionality such as summary reports and on-line analysis. Applications that do a lot of data synthesis are also going to require more client resources. Application Level Integration with Middleware The third technique also integrates the databases at the appl ication level. It differs from the previous approach by adding the use of middleware. Depending on the DBMSes used by the external systems, sometimes applications may not be able to communicate directly with an external system. Middleware provides a mechanism for appl ications to communicate with a database that are not inherently supported by the application. This technique behaves very similarly to the previous one since it only differs in the use of middleware. It is Iisted separately in order to focus on the effect middleware can have on performance, cost, encapsulation and data synthesis. Since applications are now one more layer removed from directly working with the data, there is some performance penalty. With some middleware products, this performance penalty can be quite significant. ![]() Figure 3: Integration at the appl ication level using middleware Obviously there will bean increased cost with the purchase of middleware products. The increased implementation costs may be offset by reduced long-term costs sirice, depending on the middleware product, there can bean effect on encapsulation and data synthesis as well. Some of the middleware products on the market are quite sophisticated and can actually provide a single interface to multiple databases (see Figure 3). Further, some middleware products can perform some data synthesis. While middleware wil I not ei iminate the encapsulation and data synthesis problems inherent in integration at the application level, by shifting some of the encapsulation and data synthesis problems from the applications to the middleware, it can reduce some of the software development and maintenance costs. Database Level Integration The next integration technique (shown in Figure 4) differs significantly from the previous two by implementing the interface at the database level instead of the application level. With this approach, the AM/FM/GIS system presents the external data to the AM/FlvVGIS applications as if the data were stored locally in the AlvVFM/GIS. Depending on the databases in the different systems, they maybe able to communicate directly, or middleware may be required for the databases to communicate. This technique offers many advantages in data currency, encapsulation, side-effects, data synthesis, data integrity, and scalability. Since the data is accessed directly from the external systems, it is never stale. Encapsulation is good since data location is transparent to the applications and the applications access the external data as if it were stored in the AM/FM/GIS database. Typical Iy there are few side-effects on existing applications. Implementing this interface requires database changes, but these usually do not affect the presentation of data to the legacy applications This technique rates well on the data synthesis and data integrity criteria because, by using the database to perform the integration, this technique is able to leverage the DBMSes’ built-in data synthesis and transaction management capabil ities. Scalability for this technique is much better than that of the previous techniques since the number of interfaces is not affected by the number of appl ications and increases Iinearly with the number of databases (the number of interfaces equals the summation of the number of databases minus one) instead of geometrical Iy. When evaluated by the performance and data access criteria, this technique also usualIy rates well. However, these criteria are highly dependent on the database and middleware products involved. There is some performance penalty to the application because the local database adds a layer between the application and the external data, but the interface between the databases (whether middleware or built-in to the database) can also add significant performance delays. Data access is the capabil ity that varies most by vendor. Depending on the vendor, there can be limitations in create, read, update and delete access and many times the access is not hi-directional. ![]() Figure 4: Integration at the database level directly between databases The two disadvantages to this technique are cost and administration. Middleware products are often required to implement integration at the database level and are much more expensive than the middleware products used with applications. Once the middleware products have been purchased, they typically require a significant amount of database administration effort to implement. Middleware Infrasmaure Inkgration The fifth integration technique is shown in Figure 5. With this approach, a middleware toolkit is purchased from a vendor and used to construct a single interface which all applications will use to access all of the databases. This approach can vary tremendously in its complexity. In the simpler implementations, the middleware infrastructure provides only a single interface. In the more complex implementations, the middleware infrastructure implements a logical database where the middleware infrastructure has intelligence and data processing capabilities similar to those of the databases. By creating a single interface for all of the applications and by being built on top of multiple databases, this technique provides a good solution to the data currency, data access, encapsulation, data synthesis, data integrity and scalability criteria. The most current data is available to the applications since it is accessed directly in the external system databases. All applications have access to any of the data available through the middleware infrastructure. Encapsulation is excel lent since al I appl ications have a single i nterface for all data. Depending on the middleware products used, data synthesis can usually be done in either the database or the middleware infrastructure. Transaction management is handled by the middleware infrastructure to guarantee data integrity. Lastly, this approach has the best scalabil ity since the number of database integration interfaces equals the number of databases. This technique’s evaluation on the performance criteria is not quite as good. The middleware infrastructure wi II cause some performance delays when compared to direct access to the databases. How significant the performance delays are is dependent on the middleware toolkit used and the databases involved. ![]() Figure 5: Integration at the database level using a middleware infrastructure The drawbacks to this technique are its side-effects, administration and cost. This approach has very significant side-effects since al I existing applications wil I need to be changed to use the middleware infrastructure. Implementing the middleware infrastructure typical Iy requires significant changes to the databases and the middleware infrastructure will require database administration work in addition to the admi nitration work for the databases. Both of these drawbacks add to the cost of this technique. In addition, the middleware toolkit products used for creating a middleware infrastructure are significant y more expensive than the middleware used by applications. Further, the implementation effort required to build the middleware infrastructure adds to the cost. As a result, this technique is one of the most expensive ones. Migration To provide a complete set of integration techniques, the sixth technique eliminates the integration problem by migrating the various computer systemsthat need to be integrated to a common database. This can be a massive undertaking depending on the number of databases and applications involved. As a result, the technique is only considered when there are other reasons to use migration. Some utilities may commit to a large-scale information technology (IT) modernization pmgmrn or the utility may have already decided to standardize on a single database as part of that utility’s IT strategy so the migration technique does get used. By migrating to a single database, the data currency, data access, administration, side-effects, performance, encapsulation, data synthesis, data integrity and scalability issuesare either handled by the DBMS or do not apply. The big hurdle for this technique is the cost in migrating applications and migrating data. Summary These criteria and techniques provide tools to think about the database integration problem, to identify requirements, to design integration solutions and to evaluate integration products. One general guideline that can be derived is that, whenever possible, the integration should occur at the database level (batch copy, database level integration, middleware infrastructure or migration), not at the application level. Unfortunately, no one technique can be used for every integration problem. Typically, even on a single project, different techniques must be combined to solve the problems that will be encountered with the different databases. Similarly, no single integration product can be used for every integration problem. Even if an integration product works at one site, it may not work at another. For example, one project successfully used a particular product to integrate a UNIX RDBMS with an IBM mainframe but that same product did not work for another project because the second project used a different SNA solution on its UNIX server. As the example suggests,while these criteria and techniques can provide the tools to plan and design a solution, it’s only the tip of the iceberg. Numerous technical hurdles wil I need to be overcome to successfulIy integrate databases. N a result, it is critical to incremental Iy implement interfaces and test at every step. Fortunate y, the work involved in database integration natural Iy lends itself to an incremental approach. Acknowkigmnts I’m indebted to my colleague Tom Helmer for al I the work we’ve done together to develop and refine these ideas. Products mentioned may be copyrighted by or trademarks of their respective companies. Such information is subject to change without notice. | ||
|
|