Quality Data, How do I recognize it?
Data Quality Considerations
When attempting to define the data quality specifications for a GIS there are three main areas
that need to be balanced, these are cost of data acquisition, data source availability, and the
application functionality of the GIS. All three of these areas need to be considered prior to
making any decisions on data requirements.
A prime example of how these three components are intertwined is in the example of an electric
company wanting to design a job to install 12 new customers on a line extension. In order to
complete this design, the designer will need to know what circuit phases are present and what is
the existing load on each phase to determine which phase or phases the new customers will be
fed from. The desired application is for the system to be able to complete a trace of each phase
within the circuit and determine what existing customers are being fed from each phase. The
functionality of the application comes into play in so much as if the data model of the GIS does
not account for individual phase recognition or if the customer’s feed location is not incorporated
into the data model then no single phase trace can be completed. The data availability comes
into play if both the application and functionality are present but the data is not populated in the
correct database fields or incorrect data is populated in the database fields the trace will not be
correct. The final area, data costs plays a role in so much as if the costs to acquire this data, ie, a
field survey, are more than the utility can justify than chances are the data will not be captured
and ever populated into the GIS. As depicted above all of these areas must be considered when
defining a set of GIS data quality specifications.
When defining GIS data quality it is far more complex than just saying I want my data to be
99.5% accurate. The data needs to broken down into its prime components and accuracy
standards applied to each of these components. These GIS data components are:
-
Database Design Conformance
- Connectivity
- Database Attributes
- Spatial Placement
- Map Aesthetics
- Age of Data
- Data Completeness
Even within these main components there are sub categories that must be considered. The
following paragraphs will describe each of these components and what a minimal acceptable
data standard should be for each.
Database Design Conformance
Database design conformance within a GIS refers to how the digital files are structured as
compared to the physical and logical data models. When accepting data files to be loaded into
the GIS, it is imperative that those files adhere to the database design specifications in both the
physical and logical design models. All data fields must be correctly structured and named and
all associations between database tables must be maintained. Should the data tables not be
structured correctly or the data files are in the incorrect format, the deliverable files will not load
into the GIS, or if they are force loaded, they will certainly corrupt the database integrity. For
this reason the minimal acceptable data quality standard for database design conformance should
be 100%.
Connectivity
Connectivity within a GIS refers to how the objects depicted in the GIS are connected to
accurately model the required network configuration. Connectivity can be broken down into
three main categories each with its own requirements. The way data model has been designed
will determine how these connectivity categories are prioritized. As a basic rule to follow,
connectivity within the GIS network must be maintained to meet the expectations of the majority
of expected system uses. These three main connectivity categories are; object connectivity,
database connectivity, and device connectivity.
Object Connectivity
Object connectivity deals with how the individual objects are physically connected in the
database. All linear objects should be connected to either other linear objects or point objects.
There should be no “open points” except where modeled to depict actual open points in the true
network. This type of network connectivity is maintained by having the appropriate snapping
routines in place to ensure that there are no erroneous gaps or overshoots in the graphic objects.
Linear objects should be snapped end point to end point. Point objects should have their
insertion point snapped to the end point of a linear object when connected. Polygonal objects
should be snapped closed.
Database Connectivity
Database connectivity refers to how the non-graphic database attributes are modeled to depict the
live network. Attribute fields may be populated to represent phase configuration, pressure, wire
and or pipe size and other relevant information that is depicted non-graphically. Attributes can
even be utilized to accommodate the opening or closing of devices for network traces as long as
those traces are run against selected attribute values. Database connectivity is maintained by
ensuring that only objects with like or compatible attribute values are adjoined.
Device Connectivity
Device connectivity is one of the most difficult types of connectivity to model in a GIS. In order
to model device connectivity, switching devices must be modeled so as to have true open and
closed statuses. When device connectivity is modeled correctly the only connection between two objects is through the selected device. By having true device connectivity, networks within the
GIS can more accurately duplicate the networks in the field.
In all categories of connectivity it is crucial to have 100% accuracy. Without 100% connectivity,
the GIS will not be able to perform many of its more basic applications.