A process must be established for identifying and organizing existing data records for use in the new GIS
Create a database or spreadsheet to record the data about the source data
documents. This will be the first of many that you may need to create and
manage. What data should be in the data source database or spreadsheet? First
it should contain everything that was in the definitions, then for each separate
source you should record the volumetric about that source. If it's a database, how
many records are within the database? If it's paper or scanned image sources
again, how many documents? If they are a Computer Aided Drafting and Design
(CADD) file how many are there, and if they are CADD facility drawings is there
one or many files representing a geographic area? Also, what is the frequency of
updates to these data sets? And what is the backlog of updates?
When you are at a point where you are confident that you have established a
comprehensive data source database, it's time to investigate these sources. You
will need to do one-on-one interviews with the owners and maintainers of the
selected data sources; at the interview process you should receive the data
schema(s), volumetric data, evaluate how the data is used and by whom, and
anything else that is needed. Does the data support field personnel, reporting
processes, maintenance and inspection programs, or field surveys such as leak
detection, cathodic protection test point surveys, pole inspections, etc? It is wise
to put together questionnaires that you can use during the interviews. When
requesting data schema and volumetric data notify the interviewees ahead of
time so that they can prepare for the interviews. Remember that this could take
place for two or more organizations.
This process is a lot of work and may require more than one person to complete
all of the necessary steps. When more than one person is used to complete the
interviews make sure that all of those involved are asking the same questions
and gathering the same data. If not, interviews may have to be redone. The
individuals that are being interviewed have jobs to do and their time is valuable;
the interview needs to be done once and keep the interview length to a minimum.
If follow-up questions are needed, reevaluate the questionnaire and see if these
follow-up questions should be added to prevent future work after the interviews
are completed. Also, if you find that the interviews are too long or too short,
readjust your future interviews to a more appropriate length of time. As a guide,
start with one hour and leave a little cushion on the backside for interview
overrun and final notes.
Redundant datasets or redundant data about the same or similar facilities will be
uncovered during this investigation. When this happens intelligent decisions must
be made as to which is deemed the master data set. Discussions with all parties
that own and use the data sets must be involved in the decision making process.
Once the decision is made as to which source is the master, from that time
forward the chosen master will be used for all discussions, rules, and data
modeling design templates.
Don't forget that many utilities acquire data from external entities such as city,
county and state agencies, developers, etc. These are data sources as well, and
those who are responsible for the acquisition of this data within the company
should also be included as part of the data source process.
The identification of the data sources is an important step in creating the new
GIS. When you have completed all of the interviews you will have spoken with
everyone who touches each data source and will have learned not only about the
format of the data but also how it is used and why it is needed. Now it is your job
along with your team members to compile this knowledge and make decisions on
the appropriate handling of this data. Each data source is a candidate for
conversion or migration to the new GIS. Many things will drive your decision
making process with one of your main drivers being the corporate strategy. Ask
yourself the following questions:
- Is the data going to effectively support the company by being moved to the
GIS?
- Secondly, are there other technologies involved in this technology
upgrade, such as OMS, WMS, DPS, etc.? And if so is there a more
appropriate location for this data type to reside?
- Third, what is the duration of this project; can this data be
converted/migrated within the constraints of the project schedule?
- Fourth, how will this data be maintained in the future once the GIS is
deployed?
- And fifth, how will the users access the data after deployment?
There are more questions to be answered but these are the starters. As the
project moves forward, prioritization of where the data belongs will continue, but
for now this is the beginning point.
The data model
Most likely as you were completing your task of identifying the data sources,
other team members were working the data model issues. These team members
are reviewing what is in the existing GIS data model(s) and making comparisons
and gaps of what exists and does not exist between them. During the data
source investigation you probably looked at these models at a very high level but
the technical assessment to really make quality comparisons and gap
identification should be left to those with the extensive technical backgrounds.
It may be obvious but during the time that you spent investigating record sets,
you and the data modeling team of experts should have been sharing your
discovered information. In fact, they may have and should have been involved in
the process of prioritizing the data sources for the conversion/migration effort.
And they must be involved now to design, build, test, and manage the data
model that will support not only the existing GIS data from one or more
organizations but also the other data and record sets that you have unveiled and
prioritized.
Up to this point we have discussed the data sources and what to do with them.
Earlier there was brief mention of how the current data is being used and by
whom. Once the data is converted/migrated to the GIS or possibly other
technology locations such as OMS, WMS, etc., there must be functionality to
support the use of this data, the maintenance of this data, and the integrity of this
data. Data is of no value if it lacks quality, content, currency, maintained integrity
and accessibility. Without those five components the GIS will fail when deployed.
Part of the data modeling process is to provide a data model that has been
tested and is ready to prototype the conversion and migration effort. The data
model must be stable, it must have symbology and styles for all objects, and it
must allow for the placement of all objects. This is by no means a final data
model, but it is getting to a point where it will be satisfactory for prototyping the
data selected for the data conversion and migration.