Evaluation of Endmembers Selection in Linear Spectral Unmixing
Each column of
n×c matrix
E contains the spectrum of a so-called endmember, which is the reflectance typical for a resolution cell containing nothing but the cover type of interest. The i-th endmember spectrum is equal to the mean vector
ei of class
i, but there exist some other views as well. The endmembers can be selected either from spectral libraries or from the image itself. The selection of the components from the image, however, has the advantage of being obtained easily, being simple and, having the same scale of measurement as the data [7]. Alternatively they can be inputted by field measurements and calibrated for each component involved or selected from a spectral library [4].
In the case that they are selected from the image itself, it is possible to identify the endmembers as the extreme values from a plot of two bands. This holds true provided that, the components are spectrally distinct enough and the bands plotted are sufficiently uncorrelated. The latter can be especially complicated since an examination of the ASTER bands show a high degree of correlation particularly between the three visible bands.
3. Ways to Determine Image Endmembers
The endmembers spectra, which make up the columns of the matrix E used throughout this thesis, must be known before the decomposition process can take place. As mentioned above the endmembers can be selected either from Spectral Libraries or from the image itself. As well there are some other methods to extract image endmembers from itself but the most famous way which is usually picked is
Principal Component Analysis (PCA). Final method selection is discussed after detailed explanation of these two methods which are clarified as follows:
3.1. Spectral Libraries
One possibility to determine the image endmembers is to extract them from a spectral library. Such a library, whose use is usually related to geological applications [4], contains endmembers spectra that have been measured either in a laboratory or in the field. For applications in the agricultural domain, however, these libraries are less suitable because they need to account for all processes and factors influencing the data spectra. For example, several spectra corresponding to the same green vegetation type over different backgrounds may have to be included, since multiple scatterings between leaves and a bright soil background increase the near infrared reflectance of the leaves [7]. Also, additional spectra may have to be included to cover all stages of a crop's growth or to compensate for the effects of other biological processes such as fluorescence in response to stress. Thus, unless extensive field work is done at the time of image acquisition, access to a very large library is required. After the calibration coefficients have been determined by field work, usually the image is converted to match the library. Analogously, however, the reference endmembers spectra in the library could be transformed into the endmembers spectra of the current image. In recent years, spectral libraries are used primarily to identify the composition of image endmembers [2]. First, the image endmembers themselves are determined using a method such as principal component analysis (see Section 3.2). Since the spectra found need not correspond to pure materials in the scene, their identity must subsequently be inferred from reference spectra of known materials. Smith et al. [8] express each image endmember as a linear combination of reference endmembers spectra, just as mixed pixels are expressed as linear combinations of image endmembers. In their two-step method, a function incorporating the earlier described calibration and the aforementioned alignment (unmixing) is repeatedly evaluated for different candidate groups of reference spectra until a suitable representation of the image endmembers determined in the first step is found.
Anyhow using direct spectral library method did not put into operation, because as mentioned above for the purpose of this paper which is on agricultural fields’ image, there is no appropriate spectral library. After all, Smith et al. [8] technique put into practice, and the Results section is presented bon the basis of this performance.
3.2 Principal component analysis
Traditionally, the objective of Principal Component Analysis (PCA) is to reduce the dimensionality of a data set while retaining as much of the relevant information as possible. This goal can be achieved by rotating the coordinate system such that most of the variation in the data is found along a limited number of axes, the so-called principal components. The axes where the data shows little or no variation is disregarded, which corresponds to restricting the original feature space to a smaller linear subspace. As was described earlier in Equation (1), linear mixtures of c (c<n+1) classes define a linear subspace as well, being of dimension c-1. Therefore, on the condition that mixing accounts for the greater part of the variation in the data, calculating the first c-1 principal components is a major step towards determining the
c endmembers spectra.
Smith et al. [8] developed a method based on PCA to determine endmembers from remote sensing data. In summary, each pixel is represented using a new coordinate system,

, where the ui compose of a set of orthonormal basis vectors spanning the original feature space. Next, the dimensionality of the data is reduced by replacing several of the
zi by values that are identical for each pixel-the corresponding ui are virtually disregarded-and the sum of square errors between the original pixels and their simplified counterparts is calculated. It can be derived that this error is minimal if the basis vectors satisfy,

where N denotes the variance-covariance matrix of the data after subtraction of their mean vector. In other words, the eigenvectors of N define the coordinate system in which most of the variation of the data is found along a minimum number of axes.

(a) Determination of the principal components u1 and u2by spanning the plane to achieve most pixels (b) Selection of endmembers spectra on defined triangle which coincides with the triangular shape of the cloud of pixels.
Figure 1: The PCA approach to determine three endmembers spectra
The eigenvalue ?i represents the variance of the data on the axis ui and is commonly regarded as either a primary or a secondary eigenvalue. Whereas the primary eigenvalues are higher in magnitude and account for the variation due to spectral mixing, the sum of secondary eigenvalues should be equivalent to the variances arising from the instrumentation. In many cases, the primary eigenvalues can be clearly separated from the secondary ones, which provides a criterion to test the dimensionality of the data and consequently also the number of endmembers. After discarding the information associated with the secondary eigenvalues, the linear subspace spanned by the endmembers is found.
According to the theory of convex geometry, this means that ei must be chosen such that the simplex they define contains all elements in the data set. If mixtures of only two classes are considered, then the minimum and maximum values of the first principal component can be taken as the endmembers spectra e1 and e2. For mixtures of three classes, a scatterplot of the first against the second principal component can be made. As is shown in Figure (1), with this scatterplot the smallest triangle-3D simplex-containing all pixels in the image can be determined; the vertices of this triangle define the three endmembers spectra [2].
When dealing with mixtures of more than three classes, several approaches are possible. Bryant has used a scatterplot of the first two principal components to find that the data has a pentagonal shape. By selecting one pixel near each of the five vertices, five endmembers spectra were defined. This approach, however, may fail for example if one of the endmembers is disguised by the other four since its prejudiced features are found along the other three principal components. Bateson anticipated a visualization technique based on parallel coordinates that permits to extend the set of endmembers with one spectrum at the time. Although this method solves the problem of Bryant's approach, it requires a lot of human effort to judge the acceptability of the endmembers. Furthermore, it is well possible that different users come up with different endmembers spectra [2].
4. Selection of Endmembers from the Component Space
Two different methods for selecting the endmembers from the edges of the component space described above were employed in this study. The first method sought to maximally include pixels from the edges of the two component distribution. Simply stated, this way selected more pixels and averaged the reflectance of more pixels that occur at the edge of the distribution. The second method included only the extreme values of the edge of the distribution and is termed minimally inclusive and averages less number of pixels [9]. An examination of the reflectance plot of the two ways of selecting the endmembers showed that the vegetation endmember is particularly sensitive to the way of selection. In Band 5, the reflectance of the maximally selected vegetation endmember is of 30% values whereas the minimally inclusive selection in the same band is 39%. The other endmembers do not show any particular sensitivity in any of the bands and the approximate upward shift in the minimally inclusive selection is in the order of 0.1 to 0.2. The vegetation endmember as showed up in both plots is spectrally distinct from the other endmembers which is reassuring for this study since main concern is to estimate vegetation fractions.