3 Feature Extraction
The goal of employing feature extraction is to reduce the number of features substantially without sacrificing significant information, i.e. feature extraction is a processing of projecting the data from original feature space to a lower-dimensional subspace having more effective and frequency feature space.
3.1 Feature Extraction Based On Spectral Feature Space
3.1.1 Principal Components Transform: In Principle Components Transformation, an orthogonal subspace projection is performed on the hyper spectral images and produces a new sequence of uncorrelated images. Usually the first few components contain the most variances, and the later components which would be expected to show little variance could be ignored. Therefore, the essential dimensionality of the classification space will be reduced and thus the classification speed will be improved.
Although this method can effectively provide good classification accuracy when the number of dimension is reduced, it is sensitive to noise and has to be performed with the whole data set (Schowengerdt 1997). However, significant band differences may also appear only in the higher-order components.
3.1.2 Discriminant Analysis Feature Extraction: In Principle Components Transformation is based upon the global covariance matrix of the full set of image data and not suitable for the case of multiple classes. Another method for generating a transformed set of feature axes, in which class separation is optimized, is called Canonical Analysis (Richard, 1986) or Discriminant Analysis Feature Extraction (DAFE). This approach uses the ratio of a between-class covariance matrix to within-class covariance matrix as a criteria function. Thus a transformation matrix is determined to maximize the ratio, that is, the separability of classes will be maximum after transformation.
Although the Discriminant analysis performs well for most cases, there are several drawbacks for this method(Tadjudin and landgrebe, 1998). Frist ,the approach delivers features only up to the number of class minus one. Second , if the mean values or the same, the extracted feature vectors are not reliable. Furthermore, if a class has a mean vector very different from the other classes, the between-class covariance matrix will result in ineffective features.
3.1.3 Decision Boundary Feature Extraction: Lee and land grebe (1993) showed that discriminately information features and redundant information features can be extracted from the decision boundary itself. The approach is called decision boundary feature extraction(DBFE). It was shown that all the features needed for classification are normal to the effective decision boundary. A decision boundary features matrix(DBFM) was defined to the predict the intrinsic Discriminant dimension and to extract discriminately information features from the decision boundary.
In order to determine the effective decision boundary, the majority of training samples are first selected. The number of training samples required could be much more for high dimensional data. For hyper spectral images, the number of training samples is usually not enough to prevent singularity or yield a good covariance estimate. In addition, DBFE for more than two classes is suboptimal(Tajdudin and landgrebe,1998).
3.2 Feature Extraction Based On Frequency feature Space
3.2.1 Fourier Feature Extraction: In Fourier feature Extraction (FEE), the Fourier Transform is performed on the spectral data for each pixel and a series of Fourier spectrum is produced. The frequency power spectrum can localize information about global patterns of the spectral curve. The first component which has zero frequency represents the mean of the spectral curve for each pixel. The lowest nonzero frequency component is known as the "fundamental", and the highest frequency component called the third harmonic is three times the frequency of the fundamental, and so on .Like PCT, the first few components formed by FEE contain the most important information but FEE is more meaningful than PCT.
3.2.2 Wavelet Feature Extraction: the development of wavelet decomposition and discrete wavelet transform has thus far been in terms of multi resolution where the lower scale wavelet components are considered as the optimal approximation of a higher scale single or image (Mallat, 1989). This is indeed a powerful point of view and a accurate model for hyper spectral data analysis. Specially, we decompose the hyper spectral signature using a wavelet transform and the select the fewest wavelet coefficients require to achieve the dimensionality reduction. The scale of wavelet transform can be of a very fine scale to very coarse scale. This approach is then called Wavelet Feature Extraction (WFE). According to the wavelet theory and multireolution concept, the decomposed features wavelet transformation is the optimal approximation of original hyper spectral data.
4. Experimental Result and Analysis
The test data sets (see Figure 2.a) which were delivered by the Airborne Visual/Infrared Imaging Spectrometer (AVIRIS) I this paper are derived from Purdue University. The AVIRIS data is hyper spectral data which has 224 spectral bands from 400nm to 2450nm with 10nm spectral resolution. The test fields with the size of 68*85 take over an agricultural portion on NW Indiana in 1992. the ground truth data shown in Figure 2.b include Corn, Grass, Soybean-1 and Soybean-2. All the feature extraction methods mentioned above were performed on the test data to reduce the dimensionality. In feature selection, we used the Sequential Forward method with Bhattacharya distance to select the optimal feature combinations. The approach used for classification is the maximum Likelihood Classifier after feature extraction. The number of training samples used in the classification is 350 for each class. Figure 3. shows the result Because the number of features delivered by DAFE method is only up to the number of classes minus
one, the classification accuracy is only calculated in two and three bands.

Figure 2. The test data delivered by AVIRIS on NW Indiana in 1992.

Figure 3. Classification accuracy using different feature extraction methods.
The objective of this experiment is to compare the proposed feature extraction methods with the two new methods in classification of hyper spectral data. Several results shown in Figure 3 are described in order. First, As the number of features reduced, the accuracy of classification start to reduced. Second, when the number of bands is reduced from 220 to 30, all the methods except for the DBFE and DAFE have the similar classification accuracy of 95%. It can be found that the result of DBFE is not desirable, the reason mentioned above is that the DBFE depends on the number and distribution of the training samples. Third, when the number of extracted feature is 5, the result of FFE is the best. It preserves the classification accuracy of 90%. When the number of extracted feature is 3, the classification accuracy of 90% using DAFE is the best.
4. Conclusion
The goal of employing feature extraction is to reduce the number of features substantially without sacrificing significant information. Thus the accuracy of classification could be preserved and the speed of computation could be reduced. This paper compared some methods of feature extraction which have been proposed in some articles. In addition, two new feature extraction methods are developed by Fourier transform and wavelet transform and compare with other proposed methods. The result shows that the two new methods can exactly reduce the dimensionality of hyper spectral data and preserve the accuracy of classification.
All methods mentioned in this paper are compared. The PCT is based upon the global covariance matrix of the full set of image data and not suitable for the case of multiple classes. The result of DBFE and DAFE depends on the number and distribution of training samples. The new features spaces formed by the FEE and WFE are more meaningful and more stable than other methods.
Reference
- Hughes, G.F., 1968. "On the Mean Accuracy of Statistical Pattern Recognizers." IEEE Trans. Inform. Theory IT-14. pp. 55-63.
- Hsien P.F. and D. Landgrebe, 1998. "Classification of High Dimensional Data." Ph. D. dissertation, School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana.
- Jimenez, L.O. and D. Landgrebe, 1995. " High Dimensional Feature reduction Via Projection Pursuit" , PhD Thesis and School of Electrical & Computer Engineering Technical Report TR-ECE 96-5, Purdue University.
- Lee C. and D. Landgrebe 1993. "Feature Extraction and Classification Algorithms for High dimensional Data", TR-ENVIRONMENT 93-1, Purdue University.
- Mallat, S.G., 1989a. " A Theory for Multiresolution Signal Decomposition: The Wavelet Representation", IEEE Trans. On Pattern Analysis And Machine Intelligence, Vol. 11, No. 7, pp. 674-693.
- Richards, J.A., 1993. "Remote Sensing Digital Image Analysis: An introduction", Springer-Verlag Berlin Heidelberg, Second Edition.
- Schowengerdt R.A. 1997. " Remote Sensing" Models and Methods for Image Processing", Academic Press.
- Swain, P.H. and Shirley M.D., 1978. "Remote Sensing: the Quantitative Approach", McGRAW W-HILL.
- Tadjudin, S. and D. Landgrebe, 1998. "Classification of High Dimensional Data with Limited Training Samples", PhD Thesis and School of Electrical & Computer Engineering Technical Report TR-ECE 98-8.
- Young, T.Y. and K.S. Fu, 1986. "Handbook of Pattern Recognitions and Image Processing", College of Engineering, University of Miami,. Coral Gables, Florida.