b) Nature of data:
From data affect point of view, we can emphasize on three important properties of gathered data.
- Different surface materials may be distinguished by very subtle differences in their spectral patterns.
- Adjacent pixels have influence on each other and affect the brightness value of the pixels.
Land cover types do not fit into multiples of rectangular spatial units. This occurs because some objects are not similar to the pixel size. Then, the pixel may be consisting of various land cover types. This case also is in the boundaries. In this case the brightness values do not imply any certain cover type and conceptually what is called Mixels (Mixed pixels) are made.
c) During classification process:
To produce the interested results, classifiers use these data with their characteristics. Many of classification algorithms ignore the inherent data errors and assume that the input data are perfect. Standard classifiers perform the classification using only spectral properties of the image data and have not been designed for incorporating the other types of data into their process. It is clear that using the image data (spectral properties of the scene) solely, with ignoring the errors which there are in the captured data, leads to unreliable and imperfect results.
Study Area and Data
The study area is located in the North-Western of Iran, which is called Moghan (Figure 1). Moghan Agro Industry and Livestock Co has undertaken various activities in the field of crop production, horticulture, animal husbandry and related industries. About 300,000 tons of various crops such as wheat, barely, maize (seed, grain and forage), sugar beet and alfalfa is produced annually in 18000 ha of irrigated farms (Figure 2). In this research, three parts of irrigated farms are used. These three parts are flat with 2% slope.

Figure 1. DEM of Moghan from 10 meters contour.
Field boundaries and their crop types are available from 1997 till 2001 (Figure 2).

Figure 2. Field boundaries of Moghan agricultural Area
The ETM+ image of the study area which was acquired on 2001-05-23 and the 1/50,000 map of it have been used in this study.
Theory and concepts:
Review of Maximum Likelihood Classification
To understand the application of prior probabilities to a classification problem, the mathematics of the maximum likelihood decision rule must be understood. For the multivariate case, we assume each observation X(pixel) consists of a set of measurements on p variables (channels). Through some external procedures, we identify a set of observations which correspond to a class-that is, a set of similar objects characterized by a vector of means on measurement variables and a variance covariance matrix describing the interrelationships among the measurement variables which are characteristics of the class (Abkar, 1999). Multivariate normal statistical theory describes the probability that an observation X will occur, given that it belongs to a class k, as the following function:
The quadratic product
can be thought as a squared distance function which measures the distance between the observation and the class mean as scaled and corrected for variance and covariance of the class. As applied in a maximum likelihood decision rule, Equation (1) allows the calculation of the probability that an observation is a member of each of k classes. The individual is then assigned to the class for which the probability value is greatest. In an operational context, observed means, variances, and covariances substituted by the log form of the Equation (1).
Since the log of the probability is a monotonic increasing function of the probability, the decision can be made by comparing values for each class as calculated from the right hand side of this equation. A simpler decision rule, R1, can be derived from Equation (3) by eliminating the constants R1: Select k, which minimizes
The use of prior probabilities in the decision rule
The maximum likelihood decision rule can be modified easily to take into account in the population of observations as a whole. The prior probability itself is simply an estimate of the objects which will fall into a particular class. These prior probabilities are sometimes termed "weights" since the modified classification rule will tend to weigh more heavily those classes with higher prior probabilities. Prior probabilities are incorporated into the classification through manipulation of the law of Conditional Probability (Alesheikh, 1998). To begin, two probabilities are defined: P(w
k), the probability that an observation will be drawn from class w
k; and P(X
i), the probability of occurrence of the measurement vector X
i. The law of Conditional Probability states that
the probability on the left-hand side of this expression will form the basis of a modified decision rule, since the ith observation is assigned to that class w
k which has the highest probability of occurrence given the p-dimensional vector X
i which has been observed. Using the law of Conditional Probability, we find that