|
|
|
Knowledge discovery from GIS in 'Natural Resources Targeting'
Process of Knowledge Discovery
Few data-layers in a GIS or a set of bands of remote sensing data does not convey any information. It needs use of a couple of processing tools to extract information from these datasets. However, certain information such as ground-truth and user's intuition help in selecting a proper tool to extract information from these several datasets.
General process of knowledge discovery is the extraction of spatial association, discrimination, deviations/evolution rules describing temporal changes of a prominent cluster. Figure below, shows the general process of knowledge discovery.
Background knowledge of the user is stored in a knowledge base. The database carries information relating to spatial components, their attributes. Data is fetched from the database using a database-interface. Object and attribute extraction tools finds out which part is relevant to the task of pattern recognition. Rules and patterns are extracted using sophisticated mathematical tools, information theory and statistics. The interestingness or significance of these patterns is processed at the final evaluation step to possibly evaluate obvious redundant knowledge. User has control on the system at every step and the expert judgement is also fed into the system at each step. The general methods in the process of knowledge discovery are as follows:
Generalization based Knowledge Discovery
Data and objects in a database often contain detailed information at a primitive concept level. It is often desirable to summarize a large dataset and present it at a higher concept level. Generalization based mining is the abstraction of data from several evidences from a concept level to its higher concept level and performing knowledge extraction on the generalized data (Mitchell, 1982). It assumes the existence of background knowledge in the form of concept hierarchies, which is either data-driven or assigned explicitly by expert-knowledge. The data can be expressed in the form of a generalized relation or data-cube on which many other operations can be performed to transform generalized data into different forms of knowledge. A few of the multivariate statistical techniques such as principal components analysis, discriminant analysis, characteristic analysis, correlation analysis, factor analysis and cluster analysis are used for generalization based knowledge discovery (Shaw and Wheeler, 1994).
Spatial Associations
A Spatial structure consists of a point, line, polygon or a pixel. In order to build indices for these, multi-dimensional trees are used. Spatial operations such as union, intersection and overly; Spatial orientations such as close_to, far_off, left_of and right_of are a few of the generally used operations in discovering spatial characteristics in a GIS.
A spatial association is of the form A ® B (p %) which means: p% of A are associated with B, Where A and B are sets of predicates and p% is the confidence of the rule. During spatial analysis, user observes a lot of answers. To confine to the number of discovered rules, the concept of minimum support and minimum confidence is used.
Approximation and Aggregation
Approximation and Aggregation describes the characteristics of a cluster in terms of features that are close to them. The aggregate proximity shows maximum and minimum distances of points in the cluster to a feature, average distances and percentage of points located in the distance less than the specified threshold.
Spatial Data Integration
Assessment of resources and computation of favourability measures of resources occurrence of a specific type, are problems that require integration of various layers of information or knowledges as evidences, using various statistical models. Several statistical methods like principal component analysis, cannonical correlation analysis, logistic regression analysis, weighted and targeted multivariate criterion, factor analysis, modified cannonical correlation analysis, cluster analysis, are some of the multivariate statistical methods used in resources appraisal (Pan et al.,1992).
Quantitative approaches of data-integration can be grouped into two categories: data-driven (objective) and knowledge-driven (subjective). Integration of data for computation of probabilities or favourability measures uses weights of evidence modelling (use of Bayes Theory), indicator probability theory and evidential belief function theory. The major problem with the use of the above methods is the testing and assessing conditional independence of predictive maps given the hypothesis. Extended weights of evidence modeling, the late proposed method takes care of categorical explanatory variables for quantitative data integration and analysis for resources appraisal. Knowledge driven approaches are usually a forward-chaining expert system in which the method of propagation of favourability measure through the inference network may include the Bayesian updating, fuzzy-logic or belief function for computation of posterior values of favourability given evidence(s).
|
|
|