A Distributed Non-Parametric Unsupervised
Classification Algorithm for Remotely Sensed
Optical Imagery
2. DISTRIBUTED MEAN-SHIFT CLUSTERING ALGORITHM
2.1. Introduction of the Mean-Shift Algorithm
This part reviews the basic theory of the MS algorithm according to Comaniciu (1997). Assume
that the probability density function P(x) of the feature vector x is unimodal. Feature vec-tors
y are defined as all the feature vectors which are lying in a sphere Sx of radius r, centred at x
with ||y–x||<r. Then, the expected value of the vector z=y–x is
since the first term vanishes. Hence:
The left side of Equation (5) is called the mean-shift vector. It has three important properties:
Firstly, the mean-shift vector is small for regions which have a large p(x) and a small Ñp(x) and
large for regions which have a low p(x). Secondly, the direction of the mean-shift vector corre-
sponds to the gradient direction of the probability density function. Thirdly, at the mode, which
corresponds to the maximal of the probability density function, the mean-shift vector is zero.
The first two properties can be used to develop a gradient based search methods for the modes
during non-parametric clustering because the mean-shift vector can be viewed as an estimation
of the gradient of the probability density function. The last property guarantees the convergence
of the iterative algorithm, i.e. the locating of modes (the centres of the regions of high
concentration) of multi-dimensional data sets. At the first step given a radiush for the search
window, the algorithm computes the mean-shift vector according to Equation (5) within the
search window located at position m. Afterwards, the algorithm deletes all the items (data
points) in the search window every time a new mode is found. At last the whole process stops if
the number of the remaining items is less than a preset threshold.

Figure 1: Iterative finding of modes by the MS algorithm
Although the nature of the utilised multispectral data in this investigation is multi-dimensional, a
one-dimensional data set described by the corresponding histogram is used for demonstration
purposes. Starting with the initial histogram shown in Figure 1(a), the initial location m is the
origin. Hence, the search window is defined for the interval [0…h] and the algorithm begins to
search for modes in this window. If a mode is found, the corresponding histogram bins are de-leted
and the location m moved by h units to the right. This iteration is repeated until conver-gence
is achieved. It is worthwhile mentioning that the results for reasonable choices of h are
fairly stable. This investigation used an empirically estimated radius with h=40. For a com-pletely
unsupervised approach it is assumed to test various radiuses and select one result with
respect to the number of expected and detected ground cover types.
2.2. Majority Voting Filter
The original version of the MS algorithm solely considers the contained spectral information
and neglects any available spatial knowledge of the underlying data. For example two Gaussian
distributed clusters in the two-dimensional feature space intersect each other like shown in
Figure 2 but are separated in the actual scene.

Figure 2: Test data set shown from different view points (65536 data points)
To measure the overlap rate of two clusters the spheres with radius 2s (representing more than
95% of the data points) are considered. Then, a point is regarded to be part of the overlap if its
distances to both cluster means are less than 2s. Hence, the overlap rate is defined as
(No/Na)´100% with No being the number of the points in the overlapped area and Na being the
total number of points in the data set. Applying the MS algorithm with h=40 directly to the data
set, the algorithm finds the two clusters correctly. However, 2437 points (3.72%) are misclassi-
fies which can be seen conceptually from Figure 3. Note that this misclassification was achieved
in a noise free simulation. Thus, the misclassification rate for actual data is higher.

Figure 3: Result of MS algorithm
To handle the overlapping area of clusters in the feature space and therefore to improve the
classification accuracy, the majority voting filter (MVF) is introduced which utilises the
neighbourhood information in the classification map to handle the overlapping problem. In the
classification map, a cluster index is assigned to every pixel. Then the MVF performs a local
consistency check within the proximity of each pixel and changes its cluster association if the
original assignment diverges too significantly. The basic motivation is that ground covers are
spatially separated to a large extend. Figure 4 shows the classification result for the data in
Figure 2 (extended by spatial information) in the two-dimensional space using a local environ-ment
of 3´3 pixels. As a result only 24 out of 65536 pixels are misclassified. Obviously, for
actual data the approach results in the elimination of smaller regions. However, this is tolerable
with respect to the highlighted objectives in Section 1.

Figure 4: Classification result with MVF applied
2.3. Distributed Mean-Shift Algorithm
From the modified MS algorithm described in the previous sub-section, it can be concluded that
a significant part of computational power is required in the post-processing step, i.e. the linear
time complexity is lost. To improve the processing speed the raw satellite scene is partitioned
into image strips according to the number of available processing nodes (PN) in the X-Sat com-
pute payload PPU (Bretschneider, 2004). During the distribution of the image data to the
individual nodes the histogram is created in the central field programmable gate arrays (FPGA)
of the PPU and broadcasted to all participating PNs. The modes are clearly defined through the
histogram and the creation of the cluster indices for each image pixel can be performed based on
the minimum distances. The final step of applying the MVF is executed independently in each
node.