Logo GISdevelopment.net

GISdevelopment > Proceedings > ACRS > 1996


1989 | 1990 | 1991 | 1992 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2002
Sessions

Agriculture/Soil

Water Resources / Hydrology

Disasters

Education / Communication

Forestry / Vegetation

Mapping

Oceanography / Meterology

Land Use

Digital Image Processing

Geoscience / DTM

GIS

Global Environment

Special Session on Applications of Remote Sensning and GIS to Land Degradation

WG: 1km Land Cover Data Base in Asia

Poster Session
  • Poster Session

  • ACRS 1996


    Land Use

    Printer Friendly Format

    Page 1 of 2
    | Next |

    On the Architecture of layered Neural Network for Land use Classification of Satellite Remote Sensing Image

    Shimizu, Eihan and le, Van Trung
    Deparment of Civil Engineering, University of Tokyo
    7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
    Tel : 03-3812-2111 ext. 6126 FAX : 03-3812-4977
    E-mail : trung@planner.t.u.tokyo.ac.Jp


    Abstract
    Layered Neural Network (LNN) have been proposed recently as a non-parametric classification method suitable for the efficient analysis of satellite remote sensing images. Most of the studies in this field, however, have been empirical and LNNs have been applied just like "black box" estimation machines. When a non-parametric function is trained with {0,1} binary target by the least squares, the output of the calibrated function is considered an estimate of a posterior probability. The accuracy of this estimate mainly depends on the network structure, the activation function form as well as he learning paradigm and the number of training data sued in learning. This paper discusses the application of LNN to remotely sensed data classification. We provide a theoretical interpretation for the LNN Remotely Sensed Data Classification. We provide a theoretical interpretation for the LNN classifier in comparison with the conventional classification methods. The most important part is the derivation of a generalized form of LNN classifier based on the maximum entropy principle. According to the generalized form, we discuss the relationship between the familiar type of LNN classifier employing the igmodial activation function and the other types of discriminate models such as the Multinomial Logit Mode.

    1. Introduction
    Layered Neural Networks (LNN) have been broadly applied in classification, prediction and other modeling problems. Hill et al.. (1994) gave a full review of studies comparing LNNs and conventional statistical models. With the exception of comparisons with regression analysis, however, there have so far been few studies to provide a theoretical interpretation for the application of LNN.

    This paper will show how LNNs approximate the Bayes optimal discriminate function when used for classification of satellite remote sensing image, and discuss the relationship between the familiar type of LNN classifier employing the sigmodial activation function and the other types of discriminate models such as the Multinomial Logit Model.

    2. Basic Formulating of LNN Classifier
    Let x represent a feature vector which is to be classified. Let he possible classes be denoted by wj(i=1,2.J). If we consider the discriminate function dj(x), then the decision rule is

    x Îwj, if dj(x)³dj 9x) for all j'¹j. (1)

    An LNN is expected to be the I/O system corresponding to the discriminate functon.

    Let us consider the multi-layered neural network in the figure 1 which has been applied to a variety of classification problems.


    Figure 1

    A feature vector is input to the input layer; that is, the number of neurons in the input layer corresponds to the dimension of the feature vectors. The number of neurons in the hidden layers can be adjusted by the user. The output layer has the same number of neurons and the classess.

    The output signal from the jth neuron in the output layer is regarded as the discriminant value. Let the state of the jth output neuron be represented by

    uj = g(x,w) (2)

    where w is the parameter vector which is mainly constituted by the connection weights between neurons. We are not concerned here with the fomulation of g(x,w). The output of LNN. Pj(x,w) under presentation of x is

    pj(x,w) = f(uj), (3)

    where f(uj) is the activation function. The following sigmoid function. The following sigmoid function is frequently used,


    The feature vectors xk(k=1,2,…K) for training the LNN are prepared.

    Training data (target data) are given as follows:


    the LNN is trained (Fig. 2) by minimizing a mean squared errors, that is,



    Figure 2

    Training of the LNN is performed thrugh the adjustment of connection weights. The most common method is so-called "back-propagation" which is essentially gradient descent. After the completion of training, the LNN plays a role of the discriminant function, Let the output of trained LNN be denoted Pj(x,w)

    3. Interpretation for LNN Classifer

    3.1 Relationship between LNN and Bayesian classifer

    The Bayesian optimal deision rule minimizes the probability of classification error by choosing the class which maximizes the posterior probability:


    If the prior probabilities p(wj) are equal, then the conditional probability density function p(x | wj) corresponds to the optimal discriminate function. Maximum likelihood classifier, in which a multivariable normal distribution is assumed, is frequently applied.

    The question of how the LNN classifier is related to the Bayesian optimal classifier has already been discussed by Wan (1990) and Ruck et al. (1990). The conclusion is that the output of the LNN, pj(x,w), when trained by the criteria (6), approximates the Bayesian posterior probability. In accordance with Wan (1990), we show a short proof. Consider the training data given in the form of (5). Suppose that the training data are random variables and samples from the probability density function p (x,dj(x), where


    Since pj (x,w) is the least squares estimate of dj(x), then pj(x,w) is the conditional expectation of dj(x) given x. therefore


    This means that pj(x,w), in the sense of mainimizing a mean squared error, approximates the posterior probability p(wj|x). Therefore, output of the LNN is considered an estimate mainly depends on the network structure, the activation function from as well as the learning paradigm and the number of training data used in learning. This is a theoretical background of the application of LNN into classification problem. It is also proved that a three-layered neural network, when the appropriate number of neurons are set in the hidden layer and sigmodal ctivation function are used in the hidden layer, can approximate any continuous mapping (e.g. Gallant eta l., 1988; Funahashi, 1989; Cybenko, 1989; Hornik et al., 1989). It is expected that an LNN approximates fairly accurately the posterior probability if the user chooses the right size of network, the appropriate number of training data, the stopping criterion of learn inland the appropriate form of I/O function.

    Up to this point, however, the derivations have been for an arbitrary mapping trained by dj(xk)Î{0,1}. The result is well-know in the field of statistics (Wan, 1990). The above proof provides a theoretical justification for any non-parametric discriminate function trained by the least squared criteria. The following section willdiscuss the interpretation for the activation functions used in LNN.

    Page 1 of 2
    | Next |

    Applications | Technology | Policy | History | News | Tenders | Events | Interviews | Career | Companies | Country Pages | Books | Publications | Education | Glossary | Tutorials | Downloads | Site Map | Subscribe | GIS@development Magazine | Updates | Guest Book