In this Equation, the left-hand term describes the probability that the measurement vector will take on the values Xi given that the object measured is a member of class wk.
This probability could be determined by sampling a population of measurement vectors for observations known to be from class wk. However, the distribution of such vectors is usually assumed to be Gaussian. Thus, we can assume that P{Xi | wk } is acceptably estimated by Fk(Xi)and rewrite Equation (6) as
Rearranging the Equation
Thus, the numerator of Equation (5) can be evaluated as the product of the multivariate density function
Fk(X
i) and the prior probability of occurrence of class w
k. To evaluate the denominator of expression (5), and knowing that for all k classes the conditional probabilities must sum to 1,
This Equation provides the basis for the decision rule which includes prior probabilities. Since the denominations remain constant for all classes, the observation is simply assigned to the class for which
Fk*(X
i) the product of
Fk(X
i ) and P{w
k}, is a maximum. In its simplest form, this decision rule can be stated as: R2: Choose k which minimizes
It is important to understand how this decision rule behaves with different prior probabilities. If the prior probability P{w
k}is very small, then its natural logarithm will be a large negative number; when multiplied by -2, it will become a large positive number and thus F 2, k for such a class will never be minimal. Therefore, setting a very small prior probability will effectively remove a class from the output classification. Note that this effect will occur even if the observation vector X
i is coincident with class mean vector m
k. In such a case, the quadratic product distance function (X
i-m
k)'D k
-1(X
i-m
k) goes to zero, but the prior probability term -2lnP{w
k} can still be large. Thus, it is entirely possible that the observation will be classified into a different class, one for which the distance function is quite large.
As the prior probability P{w
k} becomes large and approaches 1, its logarithm will go to zero and F
2,k will approach F
1,k for that class. Since this probability and all others must sum to one, however, the prior probabilities of the remaining classes will be small numbers and their values of F
2,k will be greatly augmented. The effect will be to force classification into the class with high probability. Therefore, the more extreme are the values of the prior probabilities, the less important are the actual observation vector X
i.
Experimental Work
Training data for each class have been collected, and then the image is classified by maximum likelihood approach. It is assumed that a prior probability of the whole classes are equal. Figure 3 is the classified image.

Figure 3. Classified image by maximum likelihood approach and equal a prior probability.
Overall accuracy of this approach is 52%. In this stage rule maps of the 8 crops can be calculated which is the basis for decision making for the software. For example Table 1 can show the rule matrix (the probability of each pixel for class W).
Table 1. Rule matrix for class W.
Since the sum of rule matrices for whole classes must be one, Table 1 will be modified to Table .2:
Table 2. Sum of rule matrices for 8 classes.