An unbiased sampling framework for ground data collection in digital image classification

Recognition of Spectral Signatures for Supervised Classification
The aim of providing training data to the classifier is to obtain spectral information which could be used to determine the decision boundaries for the classification of the whole mosaiced image. The total number of training sites chosen was 159 while the number for each class was proportional to the area of that class in the total ground survey data. The training pixels were randomly selected from the quilted image. However, an effort was made to choose training sites from almost all the ground survey segments. The mean DN values of each band for forest and non-forest land uses are presented in Figure 02(a) and (b).


Figure 2(a) Distribution of DN in different bands for Forest Cover Types in UMCA




Figure 2(b) Distribution of DN in Different Bands for Non-forest Land Uses in UMCA

Spectral Homogeneity and Class Separability
Spectral uniformity was verified for all the signatures by overlaying them upon the QUILT of unsupervised classification. This process was very useful to avoid spectral confusion to a great extent by avoiding the selection of mixed pixels of different classes. All the individual signatures were combined together to produce class signatures (10 classes) for the classification.

Two evaluations were made on the signature files. Scatterplots were viewed in two dimensional space taking each band combination. The Jeffries- Matusita distance was calculated for all the signatures. Histograms and statistics were computed for each signature. Spectral confusion was evident in paddy & urban, tea & open forest, and plantation & open forest. Polygons for these signatures were separated and the clusters were formed using statistical clustering. Thus the predominant cases of spectral confusion were eliminated and signature files were purified.

Signatures for paddy and urban classes were less divergent. Further, open forest and tea areas were spectrally confused. Frequency distributions of these classes were bi-modal. The polygons containing the training data of these classes were identified and adjusted by running sequential clustering to derive spectrally divergent, uni-modal class signatures.

Digital Classification Algorithm and Results
In this study, a priori probabilities derived from direct expansion estimates were introduced to the Maximum Likelihood classifier in order to produce an area-weighted classification for UMC. Figure 04 shows the area weighted classification of UMC compared with the NDVI image with 4 classes. The classified image has a noisy appearance due to the presence of many isolated pixels or small groups of pixels where classification is different from most of the neighbours. A comparison of the area estimates from digital classification with the direct expansion estimates are presented in Table 01

Table 01. Comparison of Digital Classification with Direct Expansion Estimates
Land use Digital classification estimates (km2)a Direct expansion estimates (km2)b Percentage Difference between estimates (km2)*
Dense Forest 201.18 293.16 45.72
Grass 679.82 509.81 25.00
Tea 355.86 473.62 33.09
Water 24.74 150.79 509.49
Plant. Forest 123.62 79.67 35.55
Open Forest 48.97 163.49 233.85
Urban 252.66 231.34 8.43
Paddy 98.16 141.00 43.64
Other 588.1 472.89 19.59
KFG 750.17 608.3 18.91
Total 3124.08 3124.08

*percentage difference between two estimates =  | ((a-b)/a)*100 |

Assessment of Classification Accuracy
The Kappa statistic calculated from the results of the land use classification with 10 classes was 0.4534 with a variance of 0.00056 and it denotes a good agreement. The confusion matrix shows that the overall map accuracy is 60.73 percent while the highest producer and user accuracy are 98 percent and 85 percent respectively. The land use classes of dense natural forest and water have contributed positively for the overall mapping accuracy. However, the spectral confusion among classes of urban, grassland, paddy and other crops has negatively influenced the accuracy.

Discussion There were obvious discrepancies between the direct expansion estimates and the supervised classification results suggesting possible spectral confusions although the class signatures were properly tested by JM distance and viewing scatterplots. These deviations were prominent, especially in the less frequent land use categories such as water and open forest. This indicates the need to achieve a higher sampling fraction for ground data collection.

For the supervised classification, a selected sample of radiometric values which have been identified as being representative to each spectral class of land use was used for training data to extrapolate the classification for the entire UMC. The supervised classification technique operates upon the assumption that images are formed by spectrally uniform and separable classes. In reality, the radiometric classes recorded by the sensor are not homogenous nor are they in all cases unambiguously separable. Information classes are typically not discrete and the recorded digital value of a spectral class is an average of the reflectance of multiple objects contained within the class. An attempt was made to make variability within each class less than the interclass variability.

In summary, it is obvious that a properly organized field data collection program supported with a complete set of field documentation is a prerequisite for a successful supervised image classification. In this study, the unbiased sampling technique provides training data for the classification. It also determined the a priori probabilities of the classification. In addition, it provides direct expansion estimates to have a comparison on the maximum likelihood classification results. Finally, it made the decision rules for the accuracy assessment. Hence, it can be concluded that a proper sampling methodology for ground data collection exclusively contributes for the success of classification.


Page 2 of 2
| Previous |