GISdevelopment.net ---> GITA 2002 ---> Data Development & Evolution-Providing Data to the Masses

Providing data to the masses in stages

Elaine M. Pettersen
Advantica Stoner
P.O. Box 86
Carlisle, PA 17013-0086


Abstract
Conversion of landbase, distribution, transmission, and customer data can be one of the most costly and time-consuming tasks in implementing a Geographic Information System (GIS). It need not be a ‚wait for everything™ approach. To provide a utility with optimum and immediate use of their critical data, data sets can be added into a ‚live™ GIS in stages. While there are sound business reasons for transitioning data in ‚chunks™ into a GIS, there can be many challenges if this path is left uncharted. This paper will discuss the importance of designing and implementing a plan so that utilities can continue to use and maintain their data while conversion is in progress. It will look at some of the complexities that can be encountered when utilities attempt to transition large sets of data into an already functioning GIS, including: how to keep backlogs of data updates to a minimum while providing already converted data to users and the conversion group; how to append each new data set into the GIS; and how to resolve issues resulting from edge matching the data sets.

Introduction
Ideally, a utility would like to convert all of its data before a GIS is operational. However, data conversion can be the most time consuming and costly aspect of implementing a GIS. As an alternative, there are many benefits for utilities that choose to do conversion in increments, converting the most critical data first, then adding the remainder as funds and time permit. For this type of conversion effort, it is imperative that a course of action be developed at the start. This paper will describe the benefits of implementing a data conversion project in stages as well as review some of the issues related to transitioning data into a functioning GIS. It will also cover several different approaches for managing data appends into the live environment. Some of the key issues this paper will discuss are:
  • What is the importance of the various datasets being converted?
  • How do the datasets interact with each other?
  • What is the plan for handling data updates while continuing conversion?
  • How much data will need to be returned to the user?
  • What is the timescale for converting each dataset?
  • How will additional datasets be transitioned into the functioning GIS?
  • What is the impact on the user when appending batches of data?
When a strategy is developed at the start of the project, converting data in stages becomes less risky and affords the utility immediate use of its most critical data. This paper is based on an approach currently used by a GIS conversion project that is being converted in stages and reflects a general trend in the GIS industry to phase implementation and conversion projects. One of the goals of this project was to get all of the utility™s assets into a single database for ease of maintenance. The landbase was converted first then the critical clean water distribution data. The GIS went live once a substantial subset of the clean water data was delivered. All subsequent deliveries were made in predefined batches and appended to the enterprise system when it became necessary for the end user to have the data online. In addition to over 2 million clean water assets, the utility™s 2 million wastewater assets, transmission assets, and customer data have all been delivered incrementally and appended in stages. The original conversion project was completed on time and within budget. The utility has recently initiated a new project to convert features not in the original project scope. Business benefits for converting data in stages

For most utilities, their budget acts as the driving factor for the amount of data that is converted at a given time. There are also sound business reasons for converting the data in stages; the benefits will be realized at both the corporate and the user level. The business will begin to reap the rewards as soon as the data is brought on line. As users and managers alike gain confidence in the data and the applications, the perceived risk associated with a large conversion project will be minimized. Once paper based data from a record office goes online, resources can be scaled back, and it will become evident to the business that this was a move in the right direction.

Users and the GIS Application
Once a substantial set of data is loaded, the users can ‚test™ the GIS application in a true-to- life environment with real data. The live environment can vary from the ‚data validation™ or ‚testing™ environment and thus provides an opportunity to resolve any technical problems with the data or conversion process at an early stage. When a user first looks at a GIS, they can be hesitant in accepting the inevitable. Getting the data to them as quickly as possible allows time for the users to get comfortable with the application tools and gain confidence in the quality of the data in their new system.

Long Term Updates
A large data conversion project can take years to complete. During that time record updates need to be ‚frozen™ even though work for the utility goes on as normal. With large areas out of reach to the records group, there will always be a backlog of data updates waiting for the records management group to process when the data goes live. In addition, there will be peaks and valleys in the update process due to rehabilitation projects or new construction. Using the approach of converting data in stages gives the business the option to outsource the update work which in turn limits the potentially ‚enormous™ backlog of updates when the system goes live and provides an external avenue to help balance updates during peak periods. Outsourcing updates or conversion of additional data can be more beneficial to the business than spending funds on temporary help that will require training in the application(s) and specifications, and purchasing additional hardware and software.

Minimizing Update Backlogs
As soon as conversion work gets underway, it will be necessary for the business to put all updates on hold for the area being converted. This will generate a backlog of updates. One of the advantages of converting in stages is that the business can continue to maintain its records up until the source data is turned over to the conversion group.

Understanding the needs of the business during the conversion period should help the utility define which data is considered crucial. Data that is considered critical from a business standpoint should be brought online as quickly as possible. For instance, if the business is planning a rehabilitation project for a section of its territory, the need to have this data available for updates means this is critical data.

A utility needs to understand its data before it can classify the datasets for conversion. There are several ways data can be classified. One way is by the type of data, such as distribution data, transmission data, or wastewater data. Another way to define data is geographically, such as by districts or by managing records office. Each of the datasets should be viewed as a separate piece of conversion work. This allows the utility to weigh the importance of the various datasets. When converting in stages, information that is not critical can be converted at a later time. Classifying data can be done not only for an entire dataset, but at the asset or attribute level as well. Rather than converting all of the water assets at once, specific assets or attributes that are less important to the business can be converted after the crucial data conversion is completed. For example, a feature such as a ‚transformer™ is a critical asset that a utility needs to have in the GIS from the start, whereas a ‚down guy™ is not as important and can be converted at a later stage in the project.

Identifying conversion issues
Once the decision to convert data in stages is made and the hierarchy for converting the various datasets has been established, it will be necessary to look at the relationships between the datasets. After the first set of data is delivered to the user and the GIS is functioning, all other data to be appended must be viewed in relation to the data already in the ‚live™ GIS.

Defining an Approach
There are two scenarios that will define the approach required for conversion work and for transitioning the data into the current GIS. If the data being converted merely overlaps in geographic location but not in connectivity, such as data being converted to a new or unused GIS layer, there will be no impact to the user during conversion. For example, when appending wastewater data into a database that holds only clean water asset data, connectivity is not an issue because each dataset has its own unique layer in the database. This assumes that white-space management is not an issue between layers. If white-space management were an issue, then it would be useful to have a copy of the user™s database available during conversion since the end user could have multiple layers on screen at the same time. However, if the continuing conversion effort has connectivity with or requires making edits to the live data, it will be necessary to devise a plan to convert data while allowing the records group to maintain their live data. This plan should include defining the batches of data to deliver, creating a data locking utility, acquiring a copy of the user™s database in which to perform the conversion, tracking work being performed, and delivering data to the user.

Allowing the Records Update Process to Continue
There are several approaches that can be taken for conversion that allow the records update process to continue. When the conversion effort takes place on-site and directly in the ‚live™ GIS, the application should be configured to ‚lock™ data as it is being updated. Locking data refers to changing the status of a feature so that it cannot be edited. Even though more than one user can work in the database there will still need to be some coordination between the records management group making the updates and the conversion group. The conversion work will continue to be defined in batches and the work assignments for record updates will have to avoid areas where the conversion is occurring.

Another option that will let record update work continue is to allow an off-site conversion group remote access to the live database. Like the above approach, record updates will have to be performed in areas where the conversion work is not taking place. With remote access to the database, the business opens itself to the possibility of data corruption; however, this is more a perceived risk than reality due to the development of several programs that control data users and system access. These two approaches are the quickest means for getting the converted data online while keeping it available to the records group and users.

When the business is unable to support one of these approaches and the conversion work needs to be done off-site, the ‚batches™ of data should be predefined and a routine for locking data in batches or along edges should be used. Locking the data ensures edits cannot be made. It can be done spatially by developing a utility that looks for all data within a grid or other specified area. Locking data allows the conversion to continue in one area at the same time as the records group updates are being made to another area without fear of overwriting data within conversion grids. The user™s data is then unlocked as each batch is re-delivered and appended into the live GIS. The area for future data deliveries will need to remain locked. In Exhibit 1, below, the darkened area provides a visual look at an area that will be locked as a batch of data is converted.

Exhibit 1 Œ Example of Locking Data by Grid


If the locked grids were viewed as an image, then the reverse lock creates a negative image of the data; i.e., data that is unlocked at the utility is locked to the conversion group. This is depicted in Exhibit 2, below, where the gray area represents assets that are locked either in the conversion database or the user™s database.

Exhibit 2 Œ Example of Reverse Locking


This prevents the conversion group from inadvertently working in a location that might also have live update work being done, and reduces the need for extensive validation and version resolution prior to appending the data back into the live GIS. As with the other two approaches, the records management group will need to manage the backlog of update assignments for the locked area. When this method is used, the utility™s users will still be able to see and ‚read™, or reference, the data within the locked area but the records group will Conversion Group has access to this data This data is locked to the utility not be able to perform updates to that data until conversion is complete. Update backlogs can be kept to a minimum by making deliveries at regularly scheduled intervals.

Tracking the Work
When the conversion effort is being done off-site, the conversion group will need to work in a current copy of the live GIS database. The subset of data that has been specified as a batch (or delivery area) is the only data that will be delivered back to the user. A method for tracking edits, moves, deletes, and additions against the original source data can be used so that only data that has been changed or added is delivered to the live database. One way of doing this is to reserve or save a copy of the original live database as a static database and compare it against the conversion database. Using SQL statements to run through the attribute fields in the database tables, a list of features that have been modified can easily be compiled and used to extract the data for appending into the live environment. The table in Exhibit 3, below, shows an example of a database table that was built to track changes in the database. It shows a comparison of old and new values at the attribute level.

Exhibit 3 ŒResults of a Database Comparison


Managing the data append
There are several points to consider that will make appending data effortless.
  • Is the converted data independent of data in the live database?
  • Does the batch being appended share a common border with the data in the live database?
  • Were updates made to data that was already in the live database?
Each of these conversion scenarios requires a different method of appending data into the live environment.

Appending Data Without Connectivity
When the batch of data does not share connectivity with the live data and conversion work is carried out independent of record updates, then the only consideration is timing the data append so it minimizes disruption to the user.

Appending Data With Common Borders
Converting a batch of data that has a common ‚edge™ with data in the live environment requires that edge to remain locked to users during conversion. As each batch is delivered the shared edge is unlocked and the data is appended into the database. The data that will border the next batch has to be relocked. Exhibit 4, below, shows an example of batches that share an edge and how the data is locked (gray area) after each batch is appended into the live database. This method of appending data continues until all of the data is converted.

Exhibit 4 Œ Locking Data on a Shared Edge




Appending Data With Updates
As discussed earlier, converting data for batches that involve updates to data and attributes of features that still exist in the live data environment requires carefully tracking all updates made to the live data. The live environment must be ‚prepared™ before appending this data.

One method of doing this is to select all the features in the live environment that correspond to the changed features in the conversion environment, and then delete the updated assets from the live environment before appending the subset that was converted, keeping in mind that the user™s dataset should have been locked. Using this approach for appending data ensures that connectivity is maintained and that updates performed by the records group do not get overwritten.

Concluding Remarks
The purpose of this discussion was to introduce issues that might encourage utilities to consider converting data in stages as an alternative to the commonly used approach of converting all of the data prior to a GIS going live. Getting the data into the hands of the business as quickly as possible will build confidence in the conversion effort, provide an early leaning opportunity and deliver an early win on a high-risk project. A utility should begin by identifying the benefits that will be realized at both the corporate and user level. Then by defining the needs of the business and establishing an understanding of critical data, the project team will be able to develop a conversion strategy that fits those needs. An important factor that will make converting data in stages work to the advantage of a utility is to define the process for transitioning the datasets into a live GIS. This should include, determining how the datasets interact with each other, defining each ‚batch™ of data to be delivered and deciding on a schedule for each data append. Designing and implementing a sound plan will make it possible to realize the benefits of providing data to a utility in stages.

© GISdevelopment.net. All rights reserved.