GISdevelopment.net ---> GITA 2001 ---> Direction for Data

A process must be established for identifying and organizing existing data records for use in the new GIS

Bradley Grabowski
Convergent Group, 6399 South Fiddler's Green Circle
Suite 600,Englewood, CO 80111
(Presented at GITA Conference XXIV with)

Karl M. Weber
Louisiana Gas Service


Introduction
Many considerations must be taken into account when planning utilities transition to a GIS, a new GIS platform or a merged GIS to combine merged utility companies. The focus of this paper will be on the transition to a new GIS and merged GIS systems. The technology that drives our industry, in this case the GIS technology, changes at significant speed. Every one to two years there are major changes to the software by the vendors which in turn causes us to take another look at the data that resides both in and out of the GIS and the other systems that are directly supported by this data or has direct links to the data. Technologies such as outage management systems (OMS), work management systems (WMS), work integration managers (WIM), distribution planning systems (DPS), maintenance and inspection (M&I), etc., are playing an increasing role in the utility industry. How do these technologies integrate or interface with one another, and how or what data is needed by each to support their role? What data is needed by OMS and where does it come from, what is the integration between M&I and GIS, and what makes sense?

To make these difficult decisions companies need a strong corporate vision of where they want to position themselves in the deregulated marketplace. This vision will help to drive them forward and utilize the technologies and the integration of those technologies to support their business focus. As you can see, data is only a piece of the driving force to accomplish these large projects. Obviously there are other things that have not been mentioned here including internal and external resources, project funding, business cases, project management needs, etc., but this is a good start to understanding the implications of such an undertaking.

The objective of this paper
The objective is to establish a process for identifying and organizing existing data records for use in the new GIS and possibly other technologies as mentioned above. This would include existing digital GIS data, external databases that may or may not directly interface with the GIS, document imaging systems, paper records, external data sources (governmental agencies, vendors), etc.

The data source definitions
If you have been through this process before you know that great pains were taken to establish the source documents for the GIS you were building. Since then you have tried to maintain the GIS and have tried to keep up with internal processes that support the GIS and processes that the GIS supports on a day-to-day basis. But things have not gone as planned. Budgets have been reduced, employee count is down, and now there is a new GIS technology. Along with the technology refresh is the merging of data from another company's GIS and systems to your company's environment.

This feels and sounds overwhelming based on your previous experiences, doesn't it? It can but it doesn't have to be. Developing a process and identifying the source data required to complete this project is what's important. Along with this understanding, the corporate strategy for the next few years will help you and your company create a successful implementation. Start this data source identification process by putting together a strategy to support and identify what the data sources are, who the owners of the data are, how they use the data, and where and how the data is stored. The list that you will build to describe the data sources will become long and will continue to grow as the discovery process unveils source after source.

The following terms are defined for this project:
  • Data Source - any paper or digital reference that is used to support daily operations and reporting requirements in the utility. The source must pertain to company assets, or must be able to relate to assets that can be represented geospatially or to geospatial land information. Examples may be current GIS databases, work orders that describe utility facilities (design and as-built information), inspection data that references facilities (paper/databases), specific types of facility records such as valves databases, regulator databases, electrical equipment databases, etc.
  • Owners of the Data, - Are they the ones who maintain the data or the users of the data or both? For this paper we will use both.
  • Data Storage - is the data stored in a database, and if so where is that database? Is the data in a paper record, again if so where is it? If they are paper are these records original or duplicated somewhere else, such as film, photocopy, or in a document imaging system?
If you have already been through this process, go back to your original lists of sources as a starting point. Most likely some of the original sources were converted to the GIS in the first or second go-around and others may have stayed in their same format or they may have changed to another type of source document. Paper records might have been filmed or scanned to an imaging system or a database may have been created to support the internal processes better. Regardless of their history they are still a source.

The other piece of the puzzle is that the same type of data resides within the merged or acquired company. The advantage you have is that you know the utility business and all utilities have to have similar types of records to support their operations. Go back to your definitions and be a detective to find those sources. Creating alliances in the other organization will be a benefit to the overall success of the project.

Create a database or spreadsheet to record the data about the source data documents. This will be the first of many that you may need to create and manage. What data should be in the data source database or spreadsheet? First it should contain everything that was in the definitions, then for each separate source you should record the volumetric about that source. If it's a database, how many records are within the database? If it's paper or scanned image sources again, how many documents? If they are a Computer Aided Drafting and Design (CADD) file how many are there, and if they are CADD facility drawings is there one or many files representing a geographic area? Also, what is the frequency of updates to these data sets? And what is the backlog of updates?

When you are at a point where you are confident that you have established a comprehensive data source database, it's time to investigate these sources. You will need to do one-on-one interviews with the owners and maintainers of the selected data sources; at the interview process you should receive the data schema(s), volumetric data, evaluate how the data is used and by whom, and anything else that is needed. Does the data support field personnel, reporting processes, maintenance and inspection programs, or field surveys such as leak detection, cathodic protection test point surveys, pole inspections, etc? It is wise to put together questionnaires that you can use during the interviews. When requesting data schema and volumetric data notify the interviewees ahead of time so that they can prepare for the interviews. Remember that this could take place for two or more organizations.

This process is a lot of work and may require more than one person to complete all of the necessary steps. When more than one person is used to complete the interviews make sure that all of those involved are asking the same questions and gathering the same data. If not, interviews may have to be redone. The individuals that are being interviewed have jobs to do and their time is valuable; the interview needs to be done once and keep the interview length to a minimum. If follow-up questions are needed, reevaluate the questionnaire and see if these follow-up questions should be added to prevent future work after the interviews are completed. Also, if you find that the interviews are too long or too short, readjust your future interviews to a more appropriate length of time. As a guide, start with one hour and leave a little cushion on the backside for interview overrun and final notes.

Redundant datasets or redundant data about the same or similar facilities will be uncovered during this investigation. When this happens intelligent decisions must be made as to which is deemed the master data set. Discussions with all parties that own and use the data sets must be involved in the decision making process. Once the decision is made as to which source is the master, from that time forward the chosen master will be used for all discussions, rules, and data modeling design templates.

Don't forget that many utilities acquire data from external entities such as city, county and state agencies, developers, etc. These are data sources as well, and those who are responsible for the acquisition of this data within the company should also be included as part of the data source process.

The identification of the data sources is an important step in creating the new GIS. When you have completed all of the interviews you will have spoken with everyone who touches each data source and will have learned not only about the format of the data but also how it is used and why it is needed. Now it is your job along with your team members to compile this knowledge and make decisions on the appropriate handling of this data. Each data source is a candidate for conversion or migration to the new GIS. Many things will drive your decision making process with one of your main drivers being the corporate strategy. Ask yourself the following questions:
  • Is the data going to effectively support the company by being moved to the GIS?
  • Secondly, are there other technologies involved in this technology upgrade, such as OMS, WMS, DPS, etc.? And if so is there a more appropriate location for this data type to reside?
  • Third, what is the duration of this project; can this data be converted/migrated within the constraints of the project schedule?
  • Fourth, how will this data be maintained in the future once the GIS is deployed?
  • And fifth, how will the users access the data after deployment?
There are more questions to be answered but these are the starters. As the project moves forward, prioritization of where the data belongs will continue, but for now this is the beginning point.

The data model
Most likely as you were completing your task of identifying the data sources, other team members were working the data model issues. These team members are reviewing what is in the existing GIS data model(s) and making comparisons and gaps of what exists and does not exist between them. During the data source investigation you probably looked at these models at a very high level but the technical assessment to really make quality comparisons and gap identification should be left to those with the extensive technical backgrounds. It may be obvious but during the time that you spent investigating record sets, you and the data modeling team of experts should have been sharing your discovered information. In fact, they may have and should have been involved in the process of prioritizing the data sources for the conversion/migration effort. And they must be involved now to design, build, test, and manage the data model that will support not only the existing GIS data from one or more organizations but also the other data and record sets that you have unveiled and prioritized.

Up to this point we have discussed the data sources and what to do with them. Earlier there was brief mention of how the current data is being used and by whom. Once the data is converted/migrated to the GIS or possibly other technology locations such as OMS, WMS, etc., there must be functionality to support the use of this data, the maintenance of this data, and the integrity of this data. Data is of no value if it lacks quality, content, currency, maintained integrity and accessibility. Without those five components the GIS will fail when deployed. Part of the data modeling process is to provide a data model that has been tested and is ready to prototype the conversion and migration effort. The data model must be stable, it must have symbology and styles for all objects, and it must allow for the placement of all objects. This is by no means a final data model, but it is getting to a point where it will be satisfactory for prototyping the data selected for the data conversion and migration.

At this point the prototype data model must be frozen; no other changes can be made to this model until the conversion and migration prototyping is completed and the test results are analyzed. Once the results are analyzed, changes will be made to the data model to support the overall conversion and migration process. But once the model has been tested and approved for a data conversion and migration pilot, it must be frozen again. Freezing the data model establishes a baseline to test and make changes against. Once the pilot is approved and any additional changes are made to the model, the tested and frozen model can be released for data conversion and migration to begin in a production mode.

Though the conversion data model is frozen, this does not stop work on the final data modeling effort. The conversion data model most likely will not have all of the defined functionality and it may not have all of the objects defined, but what is defined will strongly support the conversion and migration production mode.

Data Source Mapping
As the data model is being developed, a process called data source mapping is taking place. Data source mapping is the process of mapping a data source to a data target at the object and attribute levels. The data source investigation took place early on in the project and the data modeling has been going on for a while, but the data modeling could not take place without knowing what the source data is and its format. All of the prioritized and selected source data must have a home to reside in when the data modeling is completed. This mapping process is very tedious and takes a significant amount of effort to complete. The individual performing this data mapping process must be quite knowledgeable of the data sources and of the data model and GIS technology being implemented. This knowledge will expedite the data mapping process. Remember that the data sources being mapped here are only those being absorbed by the new GIS. During your investigation you found many data sources, not all of which are being converted or migrated to the GIS. Some will be left "as is" while others will be absorbed by possibly other databases and/or technologies, and those too will require the same process.

How is this mapping done? It is accomplished through what is called a data source matrix. This matrix again is either a database or a spreadsheet that has all of the objects and their attributes that are required to support the data conversion/migration effort. Included are all of the chosen data sources and their elements, which are then mapped to the target, and in this case are the objects and attributes of the GIS data model. This mapping process includes object-to-object or element mapping, attribute-to-attribute, and if there is a graphical component, that will be mapped as well. If there are relationships to be mapped that will also be done.

All of the above described is needed to support the conversion/migration pilot and production efforts. If, after the pilot is complete, it is found that the mappings are incorrect and/or data sources were missed during the discovery process this mapping process must take place a second time and the pilot should be rerun. Because of how the production data conversion/migration processes are done, possibly not all of the data sources need to be mapped and piloted upfront. This allows the data modeling effort to concentrate on the core utility facility objects and attributes and at a later time completes the modeling process for items such as city, county, and state boundaries or district and operating boundaries. These types of objects may not be critical to the initial conversion and migration production process.

Summary
Dealing with large amounts of data sources is not a task to be taken lightly. It is one of the most important tasks in the building of the GIS. Data is of no value if it lacks quality, content, currency, maintained integrity, and accessibility. Without those five components the GIS will fail when deployed. If done properly it is a repeatable process that can be utilized for other projects that require the moving of data, it can be used to support future mergers and acquisitions, or used to support the implementation of other technologies such as OMS, M&I, WMS, etc. The three components described in this paper are only a portion of what's required to implement a GIS, but they are very significant pieces of the puzzle. For utility companies to continue to grow and support their customers we will see more and more projects that require us to do indepth investigations of data sources. The utility revolves around data, whether it is the customer service representative, the property accounting group, regulatory reporting personnel or the field crew wanting to know what lies beneath them. They all require quality, content, currency, integrity, and accessibility to perform their jobs without fail and to return at the end of the day unharmed.

© GISdevelopment.net. All rights reserved.