Thematic Accuracy
Thematic GIS information is generated by collecting and assigning properties of spatial data to stored objects or areas, that may lead to errors, first: due to a misclassification error, or second: an error that originates in the number of different data classes occurring in the same spatial object. In some cases the favoring of one topic can be necessary to make the presentation meaningful at all (for example the detection of water reservoirs (oasis) in a desert area.
Table 3: Example of classification
error matrix
| |
Water |
Soil |
Veg |
Total |
| Water |
25 |
2 |
3 |
30 |
| Soil |
0 |
38 |
2 |
40 |
| Veg |
1 |
4 |
25 |
30 |
| Total |
26 |
44 |
30 |
100 |
Thematic accuracy ( 6 ), is the accuracy of the attribute values encoded in a database. The metrics used here depend on the measurement scale of the data: Quantitative data (e.g., precipitation) can be treated like a z-coordinate (elevation) and assessed using metrics normally used for vertical error (such as the RMSE). Qualitative data (e.g., land use/land cover) is normally assessed using a cross-tabulation of encoded and “actual” classes at sample of locations. This produces a classification error matrix.
Element in row i, column j of the matrix is the number of sample locations assigned to class I but actually belonging to class j. The sum of the main diagonal divided by the number of samples is a simple measure of overall accuracy. An error of omission means a sample that has been omitted from its actual class. An error of commission means an error that is included in the wrong class. Every error of omission is also an error of commission.
For all raster representations the unit for the evaluation of thematic accuracy is the pixel itself and for the vector-based representation it is the boundary of an objects - the polygon (or to be more exact the points). Another possibility of presenting thematic accuracy to the user is to attach to each object or even to each pixel an Accuracy of attribute value.
Resolution
Resolution (or precision) refers to the amount of detail that can be discerned in space, time or theme. Resolution is always finite because no measurement system is infinitely precise, and because databases are intentionally generalized to reduce detail [ 2c ]. Resolution is an aspect of the database specification that determines how useful a given database may be for a particular application. Resolution is linked with accuracy, since the level of resolution affects the database specification against which accuracy is assessed. Two databases with the same overall accuracy levels but different levels of resolution do not have the same quality; the database with the lower resolution has less demanding accuracy requirements. For example, thematic accuracy will tend to be higher for general land use/land cover classes like “urban” than for specific classes like “residential”. Resolution is distinct from the spatial sampling rate, although the two are often confused with each other. Sampling rate refers to the distance between samples, while resolution refers to the size of the sample units.
Spatial resolution of raster data refers to the linear dimension of a cell, whereas for vector data it is the minimum mapping unit size. Temporal resolution is length of the sampling interval and it affects the minimum duration of an event that is discernible. For example, the shorter the shutter speed of a camera, the higher the temporal resolution (other factors being equal). Thematic resolution refers to the precision of the measurements or categories for a particular theme. For categorical data, resolution is the fineness of category definitions (e.g., “urban” vs. “residential” and “commercial”). For quantitative data, thematic resolution is analogous to spatial resolution in the z-dimension (i.e., the degree to which small differences in the quantitative attribute can be discerned).
Consistency
Consistency refers to the absence of apparent contradictions and is a measure of the internal validity of a database. Spatial consistency includes topological consistency, or conformance to topological rules, e.g., all one-dimensional objects must intersect at a zero-dimensional object [ 2b ]. Temporal consistency is related to temporal topology, e.g., the constraint that only one event can occur at a given location at a given time [ 3 ]. Thematic consistency refers to a lack of contradictions in redundant thematic attributes. For example, attribute values for population, area, and population density must agree for all entities. Attribute redundancy is one way in which consistency can be assessed. The absence of inconsistencies does not necessarily imply that the data are accurate. Logical consistency covers on the one hand topological aspects and on the other hand the validity ranges of values occurring in the data set and can occur in spatial, thematic, and temporal parameters. For a Measure of topological consistency one can investigate for example the correctness of polygons.
Completeness
Completeness refers to a lack of errors of omission in a database. It is assessed relative to the database specification, which defines the desired degree of generalization and abstraction (selective omission). There are two kinds of completeness [ 2a ]. “Data completeness” is a measurable error of omission observed between the database and the specification. Even highly generalized databases can be “data complete” if they contain all of the objects described in the specification. A database is “model complete” if its specification is appropriate for a given application. Completeness informs the user about the spatial, thematic, and temporal coverage capabilities of the data according to the predefined purposes. The two Measures Omission and Commission are considered to be sufficient to describe how well a data set fulfills the demands of the user.
Quality Assurance for GIS Life cycle
The following section discusses the general stages of GIS database creation, from its start as an existing map product to its final stage as a seamless, continually maintained database. At each stage the integration of the QA plan within the process is discussed.
Data Preparation Phase
- Map Preparation: The first step in creating quality GIS databases from paper maps is map preparation, sometimes referred to as map scrub and is the most cost-effective phase to detect and correct errors.
- Edge-match Review: Edge features (those that cross as well as those that are near) must be reviewed with respect to logical and physical consistency requirements as well as noted for positional accuracy and duplication. The temporal factor must also be taken into account.
- Control Review: Establishing coordinate control for the database is the most important step in the data conversion process. Whether using benchmarks, corner tic marks or other surveyed locations, these must be visible and identifiable on the map source. Each control point should be reviewed to make sure it has a known real world location.
- Many at times, a GIS-layer is to be compiled from multiple map sources, then there are bound to be conflicts between the original map data. An example of multi-source conflict resolution may be an electrical layer that is being compiled from two maps, an overhead map and an underground map, wherein a review for duplicated features, conflicting positional locations and conflicting feature attributes is essential for reduced error.
Data Entry Phase
Digitising : Digitising is the process of capturing spatial features (points, lines, and areas), where, the point, line, and area features are converted into X, Y coordinates. A single coordinate represents a point. A string of coordinates represents a line and one or more lines that outline an area. Digitization may be carried out either manually or automatically.
- Data Conversion : Data conversion may usually generates two kinds of error, viz; random and systematic error into the database. Random error will always be a part of any form of data, whether it is analog or digital, that can be reduced by tight controls and automated procedures for data entry. Systematic error usually stems from a procedural problem, which once corrected usually clears up the systematic error problem. The key to correcting both random and systematic error is a tightly integrated plan that checks both automatically and visually at various stages in the conversion cycle. A short feedback loop between the quality assurance and conversion teams speeds the correction of these problems. Registration of paper maps, or to images with known coordinate locations introduces registration error and hence, each feature digitised into the database will have an introduced error equivalent to the RMS (Root Mean Square) error. Standards must be set and adhered to during the data conversion process to minimise the RMS error as much as possible. High RMS errors, in some cases, point to a systematic error such as poor scanner or digitiser calibration.
Data Editing Phase
Modern computer systems have the data editing capability that allows the detection of errors and possible corrections during Data entry. After the process of digitisation has been completed, GIS require the user to perform an operation that builds topology. It should permit the user to identify the types of entity errors in his coverage. Some of them will be pointed out, others must be interpreted by looking at database statistics concerning the numbers and types of entities, or by inspecting the graphics displayed on the screen for errors GIS is not designed to detect. A complementary procedure after digitizing should be looking for the following :
- All entities that should have been entered are present in right place and are correct shape and size;
- No extra entities have been digitised;
- All entities that are supposed to be connected to each other are;
- All polygon have only a single label point to identify them;
- All entities are within the outside boundary identified with registration marks.
Data Validation Phase
Validity is a measure of the attribute accuracy of the database. Each attribute must have a defined domain and range. Database validation is the process of determining if database values are reasonably accurate, complete, and logically consistent wrt. the intended use of the data. Validation will often consist of several steps, including logical checks, accuracy assessments, and error analysis. Spatial and thematic accuracy is usually measured against a known standard, whereas error analysis involves the evaluation of data with regard to measurement uncertainty, and includes source errors, use errors, and process errors.