Product quality assurance for GIS life-cycle ![]() Mrinal Kanti Ghose Scientist, Regional Remote Sensing Service Centre ISRO, Dept. of Space, Kharagpur mkghose2000@yahoo.com Abstract GIS (Geographic Information System) databases are an ever-evolving entity from their humble beginnings as paper maps, through the digital conversion process, to the data maintenance phase. The GIS technology shall comprise of geographic data that is specific and reliable and that represents as closely as possible the spatial world, we live in and neglecting that, the usefulness of the technology is short-lived. To maximise the quality of GIS databases there should exist a well-designed Quality Assurance (QA) plan that is strategically integrated through the entire life cycle of the GIS project. Until quite recently, little attention was paid to the problems caused by error, inaccuracy, and imprecision in spatial data sets. This situation has changed substantially in recent years. It is now generally recognised that error, inaccuracy, and imprecision can “make or break” many types of GIS project. The key point is that even though error can disrupt GIS analyses, there are ways to keep error to a minimum through careful planning and methods for estimating its effects on GIS solutions. Awareness of the problem of error has also had the useful benefit of making GIS practitioners more sensitive to potential limitations of GIS to reach impossibly accurate and precise solutions. The main purpose of this paper is to alert GIS analysts and its potential users to some methods that are especially suited to assessing the Quality of GIS data base and digital maps/ coverage. Attempts are also made to present a set of guidelines that are intended to establish the minimum acceptable level of Quality that should be adhered to by all of the projects and users through out the life-cycle of a GIS. Introduction Today, Geographical Information System has reached a level of Operationalisation and is transitioning from an era of promotion to opportunities for commercial development of the application services. GIS databases are an ever-evolving entity. From their humble beginnings as paper maps, through the digital conversion process, to the data maintenance phase, GIS data never really stops changing. It is now generally recognised that errors, inaccuracies, and imprecision left unchecked can make the results of a GIS analysis almost worthless. Unfortunately, every time a new data set is imported, the GIS also inherits its errors. These may combine and mix with the errors already in the database in unpredictable ways. The key point is that even though error can disrupt GIS analyses, there are ways to keep error to a minimum through careful planning and methods for estimating its effects on GIS solutions. Awareness of the problem of error has also had the useful benefit of making GIS practitioners more sensitive to potential limitations of GIS to reach impossibly accurate and precise solutions. The key to developing and implementing a successful GIS project is a well-designed Quality Assurance (QA) plan that is integrated with both the data conversion and maintenance phases of the GIS project. The fundamentals of Quality Assurance never change; completeness, validity, logical consistency, physical consistency, referential integrity and positional accuracy are the cornerstones of the QA plan. To maximise the quality of GIS databases there should exist a well-designed Quality Assurance plan that is strategically integrated with all facets of the GIS project. Obective of the paper In this paper an attempt has been made to make a systematic study of the various quality parameters of a GIS and their measurements in real life environment. This paper also presents a set of guidelines that are intended to establish the minimum acceptable level of accuracy assessment that should be adhered to by all of the projects and users. The main purpose of this paper is to present an overview of some methods that are especially suited to assessing the Quality of GIS data base and digital maps/ coverage. The issues involved in the development and implementation of an integrated GIS Quality Assurance Plan are also discussed. Quality in context of GIS Quality is commonly used to indicate the superiority of a manufactured good or to indicate a high degree of craftsmanship or artistry. Quality is a desirable goal achieved through management and control of the production process (statistical quality control). ). Many of the same issues apply to the quality of GIS databases, since a database is the result of a production process, and the reliability of the process imparts value and utility to the database [ 4 ]. Spatial Data Quality Data quality is the degree of excellence in a database. It can simply be defined the fitness for use for a specific data sets. It is fully dependent on the scale, accuracy, and extent of the data set, as well as the quality of the other data sets to be used. The conventional view is that geographical data is “spatial”, so a better definition of geographical data should include the three dimensions of Space, Time and Theme (where-when-what). These three dimensions are the basis for all geographical observation ( 1 ). Data quality also contains several components such as accuracy, precision, consistency and completeness. The result is a matrix as defined below. Table 1: Matrix showing geographical dimensions & Quality
The three components of space, theme, and time are covered by the first three Primary Parameters. The last two indicate: on the one hand if the data set is complete in terms of the queries that one wants to answer with the help of this data set and on the other hand if the representation of the data is consistent within itself. If all possible accuracy values have to be evaluated the costs of information on accuracy would be too high and thus not affordable [ 1 ]. Quality Parameters In the following a closer look at each of the five Primary Parameters pertaining to GIS quality and their associated sub-parameters are discussed. Accuracy Accuracy is the degree to which information on a map or in a digital database matches Actual/ True or Accepted values. The discrepancy between the encoded and actual value of a particular attribute for a given entity is defined as an “error”. Accuracy is an issue pertaining to the quality of data and the number of errors contained in a data set or map. In discussing a GIS database, it is possible to consider horizontal and vertical accuracy with respect to geographic position, as well as attribute, conceptual, and logical accuracy. The level of accuracy required for particular applications varies greatly. Highly accurate data can be very difficult and costly to produce and compile. Accuracy is always a relative measure, since it is always measured relative to the specification. To judge fitness-for-use, one must judge the data relative to the specification, and also consider the limitations of the specification itself [ 1 ]. Table 2: Example of E-A-V model.
Definition of accuracy is based on the entity-attribute-value model (Table- 2) Entities = real-world phenomena Attribute = relevant property Values = Quantitative/qualitative measurements Spatial Accuracy Spatial accuracy is the accuracy of the spatial component of the database. The metrics used depend on the dimensionality of the entities under consideration. For points, accuracy is defined in terms of the distance between the encoded location and “actual” location. Error can be defined in various dimensions: x, y, z, horizontal, vertical, total. Metrics of error are extensions of classical statistical measures such as mean error, RMSE or root mean squared error, inference tests, confidence limits, etc. For lines and areas, the situation is more complex. This is because error is a mixture of positional error (error in locating well-defined points along the line) and generalization error (error in the points selected to represent the line) ( 3 ). The epsilon band is usually used to define a zone of uncertainty around the encoded line, within which “actual” line exists with some probability. However, there is little agreement on the shape of the band, both planimetrically and in cross-section. The spatial position of an arbitrary object defined within a GIS data layer has a positional error that can be described by one of the Primary Parameters, Positional Accuracy. Temporal accuracy Temporal accuracy is the agreement between the encoded and “actual” temporal coordinates for an entity. Temporal coordinates are often only implicit in geographical data, e.g., a time stamp indicating that the entity was valid at some time. Often this is applied to the entire database. More realistically, temporal coordinates are the temporal limits within which the entity is valid. Temporal accuracy is not the same as “currentness” (or up-to-date ness) which is actually an assessment of how well the database specification meets the needs of a particular application. Temporal Accuracy occurs if the GIS data set has a temporal dimension and thus the spatial information data type results in the form of: x,y,z,t. For the error model it is necessary to investigate this additional coordinate for dependencies with the other three in order to pay attention to existing correlation. Thematic Accuracy Thematic GIS information is generated by collecting and assigning properties of spatial data to stored objects or areas, that may lead to errors, first: due to a misclassification error, or second: an error that originates in the number of different data classes occurring in the same spatial object. In some cases the favoring of one topic can be necessary to make the presentation meaningful at all (for example the detection of water reservoirs (oasis) in a desert area. Table 3: Example of classification error matrix
Thematic accuracy ( 6 ), is the accuracy of the attribute values encoded in a database. The metrics used here depend on the measurement scale of the data: Quantitative data (e.g., precipitation) can be treated like a z-coordinate (elevation) and assessed using metrics normally used for vertical error (such as the RMSE). Qualitative data (e.g., land use/land cover) is normally assessed using a cross-tabulation of encoded and “actual” classes at sample of locations. This produces a classification error matrix.
Modern computer systems have the data editing capability that allows the detection of errors and possible corrections during Data entry. After the process of digitisation has been completed, GIS require the user to perform an operation that builds topology. It should permit the user to identify the types of entity errors in his coverage. Some of them will be pointed out, others must be interpreted by looking at database statistics concerning the numbers and types of entities, or by inspecting the graphics displayed on the screen for errors GIS is not designed to detect. A complementary procedure after digitizing should be looking for the following :
Validity is a measure of the attribute accuracy of the database. Each attribute must have a defined domain and range. Database validation is the process of determining if database values are reasonably accurate, complete, and logically consistent wrt. the intended use of the data. Validation will often consist of several steps, including logical checks, accuracy assessments, and error analysis. Spatial and thematic accuracy is usually measured against a known standard, whereas error analysis involves the evaluation of data with regard to measurement uncertainty, and includes source errors, use errors, and process errors. Primary Validation
Quality Assurance plans can broadly be classified into two categories, viz; Visual QA and Automated QA and discussed below. Visual QA : Visual QA is meant to detect not only random error such as a misspelled piece of text, but also systematic error such as an overall shift in the data caused by an unusually high RMS value. Existence and absence of data as well as positional accuracy can only be checked with a visual inspection. The hard copy plotting of data is the best method for checking for missing features, misplaced features and registration to the original source. On-screen views are an excellent way to verify that edits to the database were made correctly. Visual inspection should occur during initial data capture, at feature attribution, and then at final data delivery. At initial data capture the data should be inspected for missing or misplaced features, as well as alignment problems that could point to a systematic error. In either case each error type needs to be evaluated along with the process that created the data in order to determine the appropriate root cause and solution. Automated QA : Visual inspection of GIS data is reinforced by automated QA methods. GIS databases can be automatically checked for adherence to database design, attribute accuracy, logical consistency and referential integrity. Automated QA must occur in conjunction with visual inspection. The goal of the automated quality assurance is to quickly inspect very large amounts of data and report inconsistencies in the database that may not appear in the visual inspection process. Both random and systematic errors are detected using automated QA procedures. Once again the feedback loop has to be short in order to correct any flawed data conversion processes. Data Acceptance Defining acceptance criteria is probably one of the most troubling segment of the GIS project, due to non availability of Standards for acceptable errors and/or any rejection criteria. GIS coverage being application specific, these can best be defined on the basis of existing data model and database design as well as the user needs and application requirements. Project schedule, budget and human resources all play a role in determining data acceptance. Further, the accepting data can be confusing without strict acceptance rules. A GIS data set may have ‘m’ features of ‘n’ attributes each. Any one feature having a single incorrect attribute, may lead to error-count conditions, such as:
Once the acceptable percentage of error and the weighting scheme have been chosen, methods of error detection should be established. The methods of error detection for data acceptance are the same as those employed during the data conversion phase. Check plots should be compared to the original sources and automated database checking tools should be applied to the delivered data. Very large databases may require random sampling for data acceptance. Data Maintenance Maintenance involves additions, deletions and updates to the database in a tightly controlled environment, in order to retain the database’s integrity. It provides the user with only one point of entry into the database, thus improving the consistency and security of the database. Maintenance applications are usually supported by a database management system, consisting permanent and local (temporary) storage systems. Data is checked out from permanent storage into local storage for update and then posted back to the permanent storage to complete the update. Pre-posting QA checks are required to ensure database integrity. Database schema are maintained so that table structure and spatial data topologies are not destroyed. Automated validation of attribute values as well as Visual check-plots for addition/deletion of large amounts of data are also useful. Periodic database validation for large multi-user databases can identify some very important and potentially costly errors. Errors or last minute changes in business rules, bugs in the maintenance application or inconsistent editing methods can all be detected during periodic validation. Conclusions The main purpose of this paper is to present an overview of some methods that are especially suited to assessing the Quality of GIS data base and digital maps/ coverage. The various quality parameters of a GIS and their measurements in real life environment are presented and a set of guidelines that are to be adhered to by all of the projects and users intended to establish the minimum acceptable level of accuracy are also highlighted. The issues involved in the development and implementation of an integrated GIS Quality Assurance Plan are also discussed. References
The author is indebted to Sri. S Adiga, Director, NNRMS/RRSSC, Bangalore, for his kind approval. Thanks are due to Dr. A Jeyram, Head, RRSSC, Kharagpur, for his views and suggestions for preparation of the paper. | |||||||||||||||||||||||||||||||||||||||||||
|
|