Introducing correctness coefficient as an accuracy measure for sub pixel classification results

Hassan Emami
Email: hemamy@yahoo.com
Maybod Azad University

S.B. Fatemi
Email: sbfatemi@yahoo.com
K.N.T Univ. of Technology

M.Mojarradi
Email: Mojaradi @yhoo.com
K.N.T Univ. of Technology



Abstract
After each classification its results must be evaluated and their accuracy must be assessed. In respect of the result's type (thematic map/fraction map), an adequate strategy for accuracy assessment must be chosen. Methods of accuracy assessment for traditional pixel-based classifications are not fully suitable for sub pixel classifications. Because, training and ground truth data are pixel-based and they can not be used directly for accuracy assessment of the sub pixel classification results (fraction maps).

Generally there is no common and standard sub pixel accuracy assessment method for evaluation of the sub pixel classification results. Very few methods and measures such as entropy and cross entropy have been proposed for the sub pixel accuracy assessment. These have some limitations to be used in accuracy asseement of the sub pixel classiifcation results. Cross entropy needs to a fuzzy ground truth data set, the matter that is not available simply. For this purpose, we introduce the correctness coeffient parameter for the sub pixel accuracy assessment. Correctness coeffient expresses the matching rate of the results of subpixel classification (fraction maps) with the ground truth data. Correctness coefficient is one of the efforts to ensure the flexibility and consistency of the sub pixel accuracy assessment regarding the type of the available data and classification methods. The proposed method for the accuracy assessment of the sub pixel classifiers make possible to inspect the classes individually. Additionally each class can be investigated individually in respect of the corresponding commission and omission errors. An experiment using the real data has been implemented in order to show the ability of the new accuracy measure.

1. Introduction
One of the most important information extraction methods for the remotely sensed images is classification. Classification traditionally is defined as a mapping function from the image apace into a nominal space in which each pixel has one label. Usually the result of the classification is a thematic map in which each pixel is allocated to a specific class. In some cases classification tries to delineate objects on the real world and this is done by preprocessing the image (e.g. segmentation). In the other hand, some of the classification methods defines the per pixel fraction of each class and allow calculating the correct area estimation of the classes.

There are many classification methods that so far have been proposed and some of them exist in the state of the art softwares like as Maximum likelihood (MLC), Minimum Distance, etc. Traditional classifiers often are oriented to generate a thematic map but this leads to the incomplete area estimation of the classes. Because of the mismatching of the sensor grid and the real object boundaries, some mixed pixels (mixels) will appear in the image (Fisher 1997).

The gray value of such mixels is a composition of the radiometric properties of the several classes (objects) and therefore generates some confusion in classification procedures. Traditional classifiers like MLH assign each pixel to only one class. Consequently the mixels usually are labeled erroneously. After each classification the results of it must be investigated and the accuracy of them must be reported. In respect of the result type (Thematic map, fraction map, ...) we can choose an adequate strategy for accuracy assessment. At last, some parameters, tables and maps will be calculated and generated to show the accuracy of the result. In this paper we try to show some aspects of the sub pixel accuracy assessment of the classified maps resulted from the sub pixel classifiers. However a rectified version of traditional accuracy measures is proposed for using in accuracy assessment of the sub pixel classifiers.

2. Sub-pixel classification methods
The main problem and limitation of traditional hard (pixel based) image classification procedures is in the classification of mixed pixels. Mixed pixel classification is a process which tries to extract the proportions of the pure components of each mixed pixel. To resolve the mixed pixel problem, there are different approaches. Some of the most important soft classification methods are: (i) Deterministic approaches; (ii) Fuzzy set theory based approaches; (iii) Neural network based approaches and (iv) Linear mixture modeling approach (Emami 2002). Among of these approaches we chose the linear mixture modeling approach to produce some (semi) fuzzy results and use them to test the accuracy assessment approaches.

There are two different mixture models for mixed pixel classification: the nonlinear mixing and the linear mixture model. The nonlinear mixture model for unmixing analysis considers not only the pixel of interest, but also involving the neighboring pixels i.e. each photon that reaches the sensor has interacted with multiple scattering between the different class types. In the linear mixture model, each pixel is modeled as a linear combination of a number of pure materials or endmembers. The linear mixture model is known as the spectral unmixing. Spectral unmixing is a method in which the user allowed to determine information on a sub pixel level and to study decomposition of mixed pixels. The basic idea under linear mixture model is that each photon which reaches the sensor has interacted with just one class (Mather 1999).


Figure 1. Non-linear mixture model                 Figure 2. The linear mixture model

In this research, the linear mixture model is concerned and linear unmixing model is used to classify a hyperspectral image. This linear mixture model can be mathematically described as a set of linear vector-matrix equations,




Solving the equation 1 results the unconstrained unmixing using no constrain. The resulting fractions may have negative values and are not constrained to sum to unity. In order to avoid this, the sum to unit constraint is added to the equations of the unmixing process. Applying the condition that all the resulting fractions must sum to unit is referred to partially constrain unmixing. However, fraction values which are negative or greater than one are still possible. Fully constrained unmixing implies an additional condition in that all determined endmember fractions must be between 0 to1. It should be noted that the final results of unmixing algorithm depend to the type and number of endmembers. Therefore, any changes applied to the reference endmembers will cause changes on the fraction map results.

A solution for the linear unmixing problem requires that; the sum of the coefficients equals one, because ensure the whole pixel area is represented in the model and also each of the fraction coefficients be nonnegative to avoid negative subpixel areas. The first requirement can be modeled by a constraint equation, for the second requirement, the coefficients need to be constrained by :


Together, the mixing equations and the constraints describe a model that must be solved for each pixel which should be decomposed, i.e. given and , we have to determine and in equation 1(Mather 1999).

3. Accuracy assessment of the classification results
Accuracy assessment is an essential post classification stage. Accuracy of the results is expressed in various forms relative to the classification results and method. The result of the common classification methods (e.g. MLC) is in the form of land cover/use map and usually the accuracy of it is assessed by comparing it to a ground truth map. The ground truth or reference map is usually stored in the digital form and defines well known land cover types for some pixels of the scene. Pixel by pixel comparison of these two maps results an error (or confusion) matrix.

From the error matrix some error and accuracy measurements are derived which each of them show some error or accuracy aspects of the final results. One of the most popular parameter calculated on the basis of the error matrix is overall accuracy. This parameter equals the ratio of sum of the diagonal elements of the error matrix on the number of pixels which have been correctly classified. For each category (class), an accuracy parameter is also defined. In each row, the ratio of the diagonal component (for each class) on the sum of pixels of that row is called user's accuracy. Analogously this ratio is calculated for each column and is called producer's accuracy.

Based on the error matrix another measure for accuracy is defined which is called Kappa coefficient. This accuracy criterion is calculated by (Richards 1993):


Commision error is defined as the ratio of the sum of the off diagonal components in each row to the number of pixels of that row. Ommision is a similar error mesurment for columns. Hence for each row we can calculate commision and user's accuracy, and for each column ommision and producer's accuracy are calculated. These fasctors need ground truth data and comparing thematic map and ground truth which results an error matrix.

Avoiding the dependency to the ground truth data, entropy is defined.Entropy measures the uncertainty in a single value of a statistical variable and is defined as the information content of a piece of information that would reveal this value with perfect accuracy. This quantity is weighted by the probability that value occurs and summed overall values, which gives (Gorte 1998):


In which, N is number of classes and (Xi/Ci) is the posteriori probability of the class Ci in the pixel xp. Thus entropy is calculated per pixel.

Sub pixel classification tries to define fraction of each class per pixel therefor these techniques have no absolute decision on the pixel label. Then generating a thematic map in this manner is not straight forward and some other postprocessings (e.g. thresholding ) must be applied. For this reason, accuracy assessment of the sub pixel classification results is not similar to the common accuray assessment methods. If we want to use the traditional accuracy assessment (e.g. confusion matrix) we have to generate a thematic map and then compare it with a ground truth map. The next section deals with the available proposed sub pixel accuracy assessment techniques.

4. Sub pixel accuracy assessment methods
In respect to the result's type (thematic map/fraction map), an adequate strategy for accuracy assessment of the classification results must be chosen. Finally some parameters, tables and maps will be calculated and generated to show the accuracy of the results. Methods of accuracy assessment for traditional pixel-based classification (previous section) are not fully suitable for subpixel classification accuracy assessment. Because, training data and ground truth are pixel-based and we can not use directly any pixel based method for accuracy assessment of the sub pixel classification results. Although in some cases the only way to compute the accuracy of a sub pixel classifier is to harden its results (Foody 1996).

On the basis of the subpixel classification results, some methods have been proposed to estimate the accuracy of such a classification method. Foody (1996) has an excellent review on the available sub pixel accuracy assessment approaches. The most of the methods that he mentioned need to a fuzzy ground truth map the matter that is not available simply.

Entropy was defined in the previous section, have several limits and disadvantages. One of the limitations of the entropy is that it can't show how much the classification accuracy is reasonable (Maselli et al. 1996). When pixels are mixed, the entropy parameter can not to be a good index for classification accuracy, because the mixed pixels have a high entropy and ratio entropy. Therofore in this cases alwayes the classification accuracy will be best. Other limit of entropy is that it can not show, how much the classification accuracy is best or poor. Therfore entropy parameter can not be used to campare the accuracy of two classification procedures. Therefore, Foody (1996) propose that we can use of cross entropy for subpixel classification accuracy, if a subpixel or fuzzy ground truth map exist. Cross entropy parameter is determined as the following equation :



P(Xp)is the posteriori probability of the classification result in pixel xp P(Xp)is the posteriori probability of the pixel xp in ground truth map. Thus cross entropy is calculated for each pixel and defined as the expected information content of a piece of information that would reveal its true class. The major problem of this method is that it needs the fuzzy ground truth map; the matter which often is hard to be available.

One of the other accuracy assessment methods for the LUM results is the area estimation and comparison.In this method the area of each class using the appropriate fraction map is calculated (Zhu et al. 2001). These values are compared with the same other areas which come from the other reliable sources (e.g. Old maps, databased or other classifications).More similar values the more accurate classifications. This approach uses the fraction maps to calculate the area covered by each class. In this manner we just sum the fractions of each class ignoring the spatial distributuion of the errors. The nonsite-specific nature of this approach is, however, a major limitation as a map could easily dsiplay the classes in the correct proportions but in the incorrect locations (Foody 2002).

Additionally this method dose note give any accuracy parameter that can be used at the comparing two or more classifcations.Logically the closeness of the estimated and true area is the basic criterion for accuracy of the classification. Thus relativly we can just say "this classification is more accurate than the other one".

5. Correctness coefficient
As mentioned in the previous section, we can not use traditional accuracy assessment procedures for sub pixel accuracy assessment. For accuracy assessment of this kind of classification results we have to use fraction maps as the main results of the linear unmixing classification.

In the first step we need a parameter to express the matching rate of the results (fraction maps) with the ground truth data. For this purpose, we introduce the correctness coeffient (CC) parameter. In order to calculation of correctness coeffient, a binary map for each class is generated from the ground truth:




In fact by this multiplication, for each pixel with value 1, the calculated fraction remains and zero components of the binary map dismiss the other fractions which have no any corresponding ground truth data. So, in this manner for each grtound truth pixel of a particular class, the relevant fraction value will be remained. Then it will be possible to calculate the correspondence of the resulted fraction with the ground truth data. after this step, correctness coeffient can be esdtimated using the folowing formula:


Ng is the number of known pixels in the ground truth map. CC also can be computed for each calass individuaaly:


NPi is the number of known pixels for ith class in the ground truth map. Correctness coeffient is expressed as the percentage and resembels the overall accuracy in the traditional error matrices and it can be used as an overall acuracy measure for the sub pixel results.

In addiotion to the accuracy parameters, some error mesures can also be derived to express the contained errors in the results. As the commision and ommision error are defined in the tradditional accuracy assessment, we introduce these parameters on the basis of the ground truth binary maps and classification resulted fraction maps. In a tradditional error matrix, commision errors define the percentaeg of those pixels that have been labled as a particular class but in ground truth are in a different category. Anagolously ommision defines the percentage of pixels from a particular class which have been labled as the other classes. By this concept we can calculate ommision and commision errors for each class using sub pixel classification results. Firstly we subtract each fraction map from the binary map for each class individually.


The resulted map has some positive and negative values. Positive values are for those pixels which have the value 1 in the corresponding binary map. Therefor the negative values are the result of the subtraction of the zero values from the fraction values. In fact the positive values are the values which have been allocated to other classes. This resembels the ommision error in the tradditional error matrices. considering the same concept we can define the commision error using the negative values:



These error mesures are defined per class and can explain the error rate of the resulted fraction map. The next section pertains to these concepts and shows one case study using the real data.

Page 1 of 2
Next