A Massively Parallel Implementation of an Image Classifier
Abstract
This paper describes the parallel implementation of a classifier based on maximum likelihood classification with pixel neighborhood modifications. This is used for remotely sensed data which are large in comparison to other forms of 2D spatial data. The computational complexity of the algorithm restricts its use and has lead to an implementation on a SIMD processor namely the DEC mpp. Issues addressed in the paper include: the ease of porting of the serial algorithm to a parallel machine and the resulting improvement in throughput due to the increased speed of computation.
1. Introduction
Maximum likelihood classification (MLC)has a long history as a tool for the interpretation of remotely sensed data. The technique is used in isolation [12] or in conjunction with other methods [9]. The algorithm is computationally expensive which is exacerbated by the large datasets that are typical of remote sensing e.g. 2048X2048 pixels in size with 6 bands of 8-bit data. For this reason implementing this form of classifier on a massively parallel machine is desirable. The important issues are the ease with which it can be done and the increase in computational performance over typically available serial machines.
The MLC technique takes the raw data and transforms them into an image composed of labels defined by the user. Labels are often called Classes and the resulting image a classified image. In essence, MLC involves allocating a pixel to a class for which it shows greatest affinity. Classes are modeled by multivariate distributions whose parameters are estimated from representative samples of the class. The multivariate distributions are then used to assesss which class an individual pixel is most likely to belong.
This paper considers the problem of processing the remotely sensed data once the class models have been obtained (that is the classifier has already been (rained). In the following processing based on multivariant distributions will be referred to as spectral processing.
Spectral processing tends to produce images that have a certain amount of noise. In the current context, noise is when a pixel's data vector has affinity for a class other than its true class i.e. an error in classification. Noise can be the result of a pixel sample containing more than one cover class, in which case the pixel is referred to as a mired pixel, or the result of errors of omission and commission present in the classification process A typical result of noise is that a pixel is classified as belonging to a class not in agreement with its general surroundings e.g. a single pixel of lupins in a paddock of wheat. Neighborhood techniques are incorporated into the classification procedure to reduce the effects of noise by taking account of neighboring pixel classifications to aid the classification of the current pixel. Processing based upon neighboring information will be referred to as spatial processing. The overall algorithm iteratively performs spectral and spatial processing to produce a classified image, and for large images becomes computationally expensive.
This paper outlines issues that were addressed during the porting an existing serial based version of the algorithm to a DECmpp massively parallel processor, and discusses the computational advantages in doing so. the original serial code was written in FORTRAN 77 and based on a SUN platform. The target machine was a DECmpp 12000 comprising 2048 comprising 2048 processors. Paralleled code was created using DEC MPFORTRAN version 2.1.46 as opposed to the more efficient parallel C language (MPL) because the code was already in FORTRAN.
The remainder of this paper is divided as follows. Sectin2 describes the architecture of the processor which governs the porting process. The classification algorithm is described in Section 3. Section 4 describes how parallelism is exploited in the algorithm. Section 5 discusses issues that arose with respect to the port to the DECmpp. Finally Section 6 discusses the computational performance and improvement in the execution time.
2 The DECmpp Massively Parallel Processor
The DECmpp 12000 is a parallel processor based on a single instruction multiple data (SIMD) architecture [11]. It is a modern implementation of ideas originally developed on the CLIP series of machines [5,6], and the original MPP [1]. These were characterized by having many simple processors with a small amount of local memory and 4 or 8 way connectivity with their neighboring processors. They are highly suited to operations that operate on single data elements or small neighborhoods e.g. 3x3. These operations are very common in image processing and analysis algorithms such as distance transforms [4] and skeletonisation [8]. It consists of a front end workstation (FE) and a backend parallel processing unit called the |Data Parallel Unit (DPU). The DPU has two components the processor array consisting of many Processing Elements (PEs) and the Array control Unit (ACU) which controls the PE array and performs operations on singular data.
Each PE in the array consists of a simple 4-bit ALU, a floating point co-processor and memory that is local to the PE. The PEs are arranged in a grid which incorporates specialized hardware for inter-PE communications. The hardware allows efficient communications between a PE and its immediate 8-connected neighboring PEs, that is if a PE is located at positions (i.j) on the grid, then efficient communications exists with the PEs located at (i-1,), (i+1,j), (i-1,j-1), (i-1,j+1), (i,j+1), (i,j-1), (i+1,j+1), and (i+1,j-1), This form of communication is implemented in hardware and is referred to as the X-net. The connectivity is arranged as a toroid in which the top row, and the left column of the array is connected to the right column. This allows sufficient raster shift communication to be used.
Two programming languages exist for the DECmpp. (1) MPFORTRAN which is based on FORTRAN 77 with extensions from VAX FORTRAN and the FORTRAN 90 ANSI standard, and (2) MPL which is based on ANSI C, and like MPFORTRAN has parallel constructs for handling arrays and I/O code written in MPL as MPFORTRAN 2.1.46 did not support reads on byte data and the rest written in MPFORTRAN.
MPFORTRAN provides high level constructs for handling array operations, where ideally each PE performs the processing for one element of the array. When an array has dimensions larger than the dimensions of the PE grid, then the array is mapped to the grid such that a PE may process more than one element. The process of mapping an aray which is larger that the PE array is Called virtualization. Performing virtualization efficiently for arrays of different sizes is a complex problem. for MPL virtualization has be addressed by the programmer whereas MPFORTRAN automatically handles virtualization.
The DECcpp used was configured with 2048 PEs arranged on a 32X64 grid. Each PE had 64k of local RAM nad the DPU hardware is scalable to 64k PEs.
3 The Classification Algorithm
This section provides an overview of the classification algorithm, and is based on previous work (10,3,2,). In the following. it is assumed that an input image is available and the class means and covariance matrices for each class have been estimated during training.
3.1 Spectral Processing and Typicality Probabilities
Assuming multivariate Guassian densitis for k classes and an image composed of m spectral bands, this corresponds to calculating, for each class k, the density function

defined as:
where x
k is the vector of sample means, V
k is the sample covariance matrix, x
m is the pixel data vector and
D
2k, m is the squared Mahalanobis distance for vector x
m and class k.
The density Function

is also used to calculate typicality probabilities. These probabilities consider each class in isolation, and are used to judge whether a pixel conforms to the class statistics. If it does not, it is labeled as being atypical. for more details see [3]:
3.2 Spatial Processing
Neighboring pixel class labels produced from the previous stage are used to influence the local influence the local prior probability of the pixel in the current iteration. For a pixel located at image position (i.j), the neighborhoods considered are
N
1 = {(i-1, j), (i+1, j), (i, j-1), i, j + 1)} (3)
and
N2 = N1 U {(i-1, j-1), (i+1, j-1), (i-1, j+1), (i+1, j+1)} (4)
where N
1 is referred to as a first-order neighborhood and N
2 a second-order neighborhood. The distinction is necessary because the distance between first-order neighbors is 1.0 and the remaining neighbors is , This difference is taken into account in the algorithm.
A count n
k of neighbor class labels is made (that is, there are nk neighbors having class label Ck) and the probability of the pixel belonging to class k given neighborhood N is calculated by:
P(Ck|LN) = A-1e0.5(ak+bknk) (5)
where L
n denotes the labelling of pixels in N,
ak and
bk are user controlled parameters, and A is a normalising constant.
Hence the probability of a class label is increased as a function of n
k. If n
k is large then the probability will increase and visa versa for a low value of nk The constant is used to keep the sum of the probabilities for all classes equal to 1 . the parameters
ak and
bk are used to tune the system.
3.3 Posteriour Probabilities
Results from sections 3.1 and 3.2 are combined to form the posterior probabilities of class membership. The values are calculated as:
If super-classes ae specified (a class that is made up of two of more classes), then the posterior probabilities for a pixel belonging to each super-class is evaluated by simply adding the probabilities for all its member classes.