Keywords: Hyperspectral, Classification, Clustering, Data Representation, Histograms.
Abstract
This paper presents a procedure on how to establish an adaptable class data representation instead of
adopting the Gaussian distribution assumption. By combining supervised and unsupervised classification
methodologies, the data is represented in a one-dimensional cluster-space and a set of subclass signatures is
generated to fit class data better. This results in a better performance than the conventional minimum distance
classification technique. Since only the first degree of statistics is used in the proposed method, the number of
training samples required can be much fewer than that for using maximum likelihood classification. Experiments
have been carried out using an AVIRIS data set and the advantages in nonparametric multi-signature classification
are demonstrated with improved classification accuracy.
1. Introduction
Hyperspectral remote sensing data, such as recorded by AVIRIS, covers the full solar reflected portion of the
spectrum with high spectral resolution of 10 nm. It provides rich information on ground cover types and makes
possible detailed study and monitoring of the Earth. However, data representation and interpretation become
difficult due to the high dimensionality which is the result of using hundreds of spectral bands.
Statistical maximum likelihood classification is based on the assumption that the probability distribution for each
class is of the form of a multivariate normal model with dimensions which equal the number of spectral bands.
This parametric method has been widely used for broad-band remote sensing data, for example, Landsat MSS and
TM data. However it requires a large number of training samples for each class in order to obtain reliable
estimation of statistics. The rule of thumb is that the number of training pixels per class should be at least 10 times
the number of bands used, and desirably 100 times (Swain and Davis, 1978). However, the number of training
pixels is always limited in remote sensing image classification, since training pixel identification is a
time-consuming step and can be very costly. When the number of training samples is finite, maximum likelihood
classification accuracy will not always increase with the number of features used; it starts to decrease when the ratio
of the number of training samples to the dimensionality is low. This has been referred to as the Hughes
phenomenon (Hughes, 1968). Therefore, the reliability of maximum likelihood classification is reduced for
hyperspectral data.
Neural network methods have been used in hyperspectral remote sensing data classification in the recent years due
to their nonparametric properties (Benediktsson et al., 1993). Particularly, multiple Kohonen self-organizing maps
have been proposed which represent data’s high-order statistics using several model-free elastic ‘maps’ (Wan and
Fraser, 1999). Among the parametric methods, the minimum distance classifier is more effective than the
maximum likelihood classifiers when training data is limited since it needs only to estimate the class mean position.
However, an information class is often not sufficiently represented by its mean position and is expected to consist of
a set of spectral classes, ie. clusters. To allocate the associated clusters to an information class, a hybrid supervised
and unsupervised methodology has been developed. Unsupervised clustering is performed first, the clusters are
then recognized as information classes with the aid of the inspection of the bispectral plots of the spectral class
centres and the training data (Jia and Richards, 1999). This procedure can be reasonably implemented manually
for a conventional low dimensional data. With hyperspectral data, however, it will be a difficult task to identify
subclasses manually since visualization of the reference data and cluster centres are impossible. An automatic
supervised nonparametric classifier was proposed by Skidmore and Turner (1988) and tested with SPOT data. For
each class, training data is used to construct a feature-space histogram, that is, the plot of the number of training
pixels for each feature space cell. The cell is assigned to the class whose normalized histogram count is the
highest. A LUT is then created for labeling unknown data. This classifier treats each cell as a separate decision
rule and the real class data distribution shape is used. Therefore the classification accuracy can be improved.
However, the number of cells will be too high and the histograms will be very sparse and flat for hyperspectral data.
In this paper, the hybrid classification method has been extended and further developed quantitatively.
Cluster-space data representation is proposed so that a set of cluster mean signatures is generated for each class
automatically. The method can be implemented easily regardless of the number of spectral bands used.
Experimental results have shown that the new method can improve the classification accuracy.
2. Methods
2.1 Cluster-Space Data Representation
Since an information class data set is often not sufficiently represented by its mean position only, it is expected to be
represented by a set of spectral classes, ie. clusters. These clusters however must be separable from other classes’
clusters. Therefore, the separable clusters are first generated from all the training data. Clustering algorithms, for
example, ISODATA (K-means) (Schowengerdt, 1997), can be employed in this step. To ensure adequate data
representation, a large number of clusters are generated. The cluster vectors will be used to replace the original
radiometric ranges, say 12 bits. The original training data is then labeled into the clusters generated based on
Euclidean distance measure. For each class, the number of pixels which are labeled into each cluster will be
counted and a histogram can be plotted. The graph provides a cluster-space representation of the training data. A
summary is given below.
Let x be a pixel vector of length N which is the number of the spectral bands used. There are L
i training pixels
for class
wi. They are
xi,1, xi,2, . . .xi,L.
The new representation of these data is
hi(1), hi(2), ...hi(K).
where K is the number of clusters which are generated from all the training data. h
i(k) is the number of samples
from class
wi which are labeled as cluster k. Obviously,
hi(1)+hi(2)+ . . .hi(K)=Li .
The advantage of the new representation is that the high dimensional data has been displayed in a one-dimensional
cluster-space. Class data examination becomes easier with the available histogram plots. The histograms also
show how the class data is distributed among the clusters. To account for the different training set sizes in each
class, normalized histograms are used, which can be found by
Hi(k)=(hi(k)/Li) x100%, k=1,2,. . .K
H
i(k) indicates the chance of finding a pixel of class
wi from the cluster k. In other words, this is the estimated
probability distribution for class
wi.
Instead of adopting the common assumption of Gaussian distribution, the
cluster-space data representation makes the use of the true distribution shape possible in the classification process.
This is discussed in the following section.