Logo GISdevelopment.net

GISdevelopment > Proceedings > ACRS > 1992


1989 | 1990 | 1991 | 1992 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2002
Sessions

  • Plenary Session
  • Agriculture & Soil
  • Water Resources
  • Agriculture & Forestry
  • Education
  • Forestry
  • Mapping from Space
  • Oceanography
  • Land Use
  • Digital Image Processing
  • Geology
  • Digital Images Processing
  • Earth Environment


  • Poster Sessions
  • Poster Paper 1
  • Poster Paper 2
  • Poster Paper 3



  • ACRS 1992


    Digital Image Processing
    Printer Friendly Format

    Page 1 of 2
    | Next |


    Neural network structure generation for the classification of remotely sensed data using simulated annealing

    Hosomura, P.K.M.M. Pallewatta, H.N.D. Karunasekara
    Division of Computer Science Asian Institute of Technology
    GPO Box 2754, Bangkok 10501, Thailand


    Abstract
    Artificial feed forward neural networks are increasingly being used for the classification of remotely sensed data. however the back propagation algorithm, which is so widely used does not provide any information about the structure of the network for a given classification problem. Minimum sizes networks are important for efficient classification and good generalization. This paper describes a method to simultaneously learn the structure and the weights of the networks from training data using Simulated Annealing optimization technique.

    Introduction
    The use of artificial neural networks (ANNs) as classifiers as an alternative to statistical classification has become popular since the development of the back propagation algorithm. Feed forward ANNs, trained by back propagation have been used by several researchers for the classification of remotely sensed data. Please refer to (4) and (5) for details about back propagation. One of the most common difficulties encountered by all the researchers is that the structure of the networks, that is optimal for a given classification task is unknown. Often researchers select the structure by starting from a minimal seized network and increase the nodes in the hidden layers until the network can be successfully trained. Alternatively they may get a rough idea about the structure by observing the data (training data in a Remote Sensing context) in feature space. Back propagation does not provide us with any idea about the structure of the network for a given classification problem and it does not guarantee a solution. A network too small for a given problem will cause back propagation not to coverage to a solution while larger ones lead to poor generalization. Large networks also consume more resources while smaller ones can be efficient in software implementations. The technique proposed here simultaneously learns the weights and the minimal structure for a given classification problem by minimizing a cost function that is proportional to the total error over the pattern set, to the number of nodes and to the number of connections in the network. Simulated annealing is being employed as a technique to perform the optimization due to it's ability to optimize very complex problems. Our approach is slow due to the use of simulated annealing, which can be considered as a disadvantages.

    Simulated Annealing
    This technique was introduced in 1983 by Kirkpatrick et al as a tool to solve complex optimization problems.

    In physical systems the state of minimum energy is called the ground state. Simulated annealing treats the system to be optimized as a physical system and function to be minimized as an energy function. Then the problem of combinatorial optimization transforms to a problem of finding the ground states of the system.

    With simulated annealing, one starts at a very high temperature and simulates the slow cooling process of a physical system, reducing the temperature slowly to reach the ground state. The cooling must be slow enough so that the system does not get stuck in metastable states that are local minima of the energy. For a given temperature states of a physical system is described by the distribution,


    E - energy of the state
    kB - Boltzmann's constant
    T - absolute temperature

    It should be noted that the concept temperature in physical systems have no Obvious equivalent in optimization problems. In simulated annealing temperature will be a parameter that controls the randomness of the states accepted.

    The algorithm for simulated annealing is as follows. Let X be a vector of variables and E(X) be the function to be minimized. The temperature is kept at a high value initially. One starts with a random value of X and a small random perturbation DX attempted. The energy change in the system caused by this DE caused by this perturbation is computed.

    If the perturbation has reduced the energy function (i.e DE < 0), it is accepted and the new state X + D E > 0 the perturbation is accepted with a probability of e-DE/T. The sequence of states produced by this algorithm is distributed according to (1) [2]. At high temperatures the e-DE/T is close to 1 regardless of DE. So almost all perturbations are accepted and the resulting configurations (States) area very random. But at low temperatures this expression for the probability of acceptance favors perturbations that tend to keep the energy low. The initial heating up of the system is necessary to make the final state independent of the initial state (3). In cooling the system, one should slowly reduce the temperature only after thermodynamic equilibrium has been reached at the current temperature and the procedures should continue until T = 0. In practice there are many cases, where there are many minima very close to the absolute minimum. In such case it will be sufficient to find one of them [2].

    For problems of constrained minimum, the method can be adapted by excluding the perturbations that do not satisfy the constraints [1] [2]. Simulated annealing can also be applied to discrete optimization [1]. For a more detailed description of simulated annealing please refer to [1].

    Neural Network Structure Generation
    Supposing we are given a set of training data samples along with their respective classes. If the input data are n dimensional real vectors and the classes are m dimensional real vectors, the mapping that exists between them is a function f : R > Rm . Our task is to find a network N', which computers this function. If we are given a function f : R - > Rm, the error produced by this network for pattern pair (training data + class vectors) can be expressed as in [4],


    tpj - Target value of the jth position of pattern p.
    Opj- output value of N' the jth position for pattern p.

    and the total error accumulated over the training set can be expressed as,


    np - num of patterns.

    Page 1 of 2
    | Next |

    Applications | Technology | Policy | History | News | Tenders | Events | Interviews | Career | Companies | Country Pages | Books | Publications | Education | Glossary | Tutorials | Downloads | Site Map | Subscribe | GIS@development Magazine | Updates | Guest Book