A Study on Spatial Distribution of Biomass Resources Using RSD


G. S. Sheshagiri
G. S. Sheshagiri
CGPL
IISc, Bangalore
sheshagiri@cgpl.iisc.ernet.in

M. Dhinasekar
M. Dhinasekar
CGPL
IISc, Bangalore
dhinasekar@cgpl.iisc.ernet.in

D. R. Anantha Deshpande
D. R. Anantha Deshpande
CGPL
IISc, Bangalore
anantha@cgpl.iisc.ernet.in

N. K. S. Rajan
N. K. S. Rajan
CGPL
IISc, Bangalore
nksr@cgpl.iisc.ernet.in


Abstract.
Renewable energy is gaining momentum world over in the wake of depleting fossil fuels. The technology to produce electrical power using biomass as a renewable source is in advanced stage. Estimation of biomass geographically and putting them in GIS environment is significant for this approach.

Biomass availability to produce power can be assessed to an acceptable accuracy using the Land–Use and NDVI patterns derived from RSD. Conjunctively the Agricultural polygon, NDVI, Isohyets and the statistical agro-crop parameters as information are used, and polygons are spatially classified into seasonal crop polygons. ANN-BP technique is used to enhance the crop wise classification. The code developed enables control for generalization of the study. The output would display the results graphically and in user-interactive mode. The results of the analysis have produced a crop production with an acceptable precision as verification. Additional ground truth parameters are used to derive the estimation of biomass from these verified results.

The method has been successfully adopted and a case-study for Karnataka has been completed with the verification sets matching within 10% of the reported figures that are found to be satisfactory for this application. The study is shown to have a significant role in realizing biomass based power plants.

1 Introduction
Biomass energy is one of the renewable sources of energy that can be used to generate either thermal or electrical power using Gasification technology. The biomass basically comes from crop plants and trees. RSD [Remote Sense Data] processed into Agricultural areas represented by vector polygons are available at a ground resolution of 188mtr. Similarly the layers containing vector polygons in respect of NDVI and rainfall are also available. Though the relation between NDVI, Rainfall and type of crop is random the purpose of this study is to find out the separability of set of related data to identify a crop for each distinct agricultural area. Once the type of crop is known it is possible to assess its available biomass by knowing the quantitative relation- CRR, Residue yield after deducting the current societal usages. Digressing briefly, the plants have the property to absorb more of visible light energy during photo synthesis and less of IR [Infra Red] and vice versa after the complete growth. Using this property of crops a differential index to gauge the vegetation is generated spatially by remote sensors aboard satellites. Mathematical equation to compute the Normalized Difference Vegetation Index [NDVI] is given in (1).

NDVI = (NIR – VIS) / (NIR + VIS) (1)
Where NIR = Near infrared reflectance from crop plants
VIS = Visible light reflectance from crop plants

Vegetation due to the growth of plants can have a spatial relation to the type of crops based on the fact that the each of the crops exhibits a range of vegetation. It is also known that growing habits of crops depends on the seasonal rainfall levels. Attributing vegetation and the rainfall level can provide acceptable crop classification of spatial Agriculture polygons.

The cropping patterns are not same among districts or taluks [a smaller region than a district] or villages or states in each season. They are driven by many factors- societal, rainfall, market, economics, weather, soil, water resource, agricultural rules and habits. Yet considering the TFL [Tobler’s first law of Geographical Information] that near things resemble more than the far ones, in the availability of only few of such attributes at taluk level, some times it is necessary to provide spatial data mining capability to assess the biomass surplus on the geographical area of interest. Such an arrangement of data to enable a geographical query is made possible by GIS [Geographical Information System] using RSD based Vector maps. GIS is designed to hold Agricultural areas [polygon], Vegetation areas, Rainfall areas, demography, roads in different layers very similar to 2.5D CAD platforms using suitable projection system for spatial referencing.

As a precursor to geographically mine the data on biomass, it is necessary that the remote sensed vectored polygons representing the Agricultural polygons are classified into possible crops grown. This is done by pipelining validity, formalization and application of available surveyed crop attributes and crop statistics onto agricultural polygon vectors. This study explains the methodology planned in two stages for the reason to evolve a fast method to make an initial biomass assessment following it up with a more accurate feed forward neural system. The simple rule based fuzzy distributor of crops spatially at the district level on Agricultural polygons has been completed for the whole country with a Taluk level spatial accuracy of 50-90%. The feed forward neural network based GIS processor taking into consideration the non-linear set of crop attribute set involving NDVI & Rainfall is developed and under study to resolve the crops more accurately.

2 Prior researches
Remote sense exploited for commercial applications on large scale can only be dated back by about a decade. The reason is that the computing power has shifted from colossal computer centers to small affordable desktop PCs in the recent past with a comparatively easier and cheaper access to remote sense data. The processing and analysis of remote sensed data involves intense numerical computation and graphics. In the wake of this, Biomass assessment needs analysis of geographical areas across the country with an impetus on application. Such an analysis across the country [India] to assess the availability of biomass seasonally at Taluk level is not known to have been done till now.

3 Data inputs
For the purposes of analysis and study of biomass surplus to compute the power generation potential, it is necessary that the digital map is crop distributed using either rule based fuzzy logic or neural network. The following data are needed as inputs for the purpose.
  • The statistical crop data at district level
  • Agricultural polygon vectors in digital form extracted from RSD raster carrying Taluk, District and State names.
  • Taluk level sample statistical crop data and look up tables for crop parameters
4 Software tools
VB.NET [Visual BASIC .NET] is used to develop the two types of vector classifiers for crop along with compatible GIS software- GeoConcept. The VB.NET has an advantage over other programming languages as a RAD [Rapid Application Development] tool. The GeoConcept provides freely redistributable software providing the necessary geographical functions wrapped in a VB kit. This will help in bringing in uniformity for Atlas application development. The OS [Operating system] used is WindowsXP. SQL server 2005 is used for the back end data processing.

5 Spatial assessment of Biomass
Two methods of Agriculture vector classification are employed. One is a fast vector classifier using a linguistic fuzzy rule and the other is a neural network classifier with back propagation to train the network for classification. The FF network classifies based on the winner output i.e. the maximum striking output. The two types of crop classifiers are described below in 5.1.1 and 5.1.2. The section 5.2 briefs the crop distribution in GIS after classification of Agricultural polygons into crop area. In 5.3 the geographical assessment of biomass is briefly explained under ‘Biomass assessment Results and discussions’.

5.1.1 Fast Vector classifier
The crop distribution analysis can be basically categorized into spatial correlation, crop attribute explication and statistical district level crop data aggregation [Table.3]. The geographical information is complex set involving polygon, line and point objects. Combining these features with the statistical data needs empirical formalization and orderly description of randomness of crop grown over the geographical area.

Due to the huge amount of graphical data to be checked & processed for use, an initial simple fast-fuzzy-logic [FFL] was used to study the crop distribution on the agricultural land use using district level crop statistics along with other crop attributes. This was chosen on the fact that the satellite images are processed to form agricultural land use vector polygons because of physical similarity of pixels in each polygonal geo-area. The second valid assumption is that the larger polygons account for major crops. Thirdly the societal growing habit is applied to the polygon in the region using a simple ‘if then else’ logic with a pre defined probable weight-age to the selected crop. The spatial spread error at the taluk level may occur if some minor or insignificant crop is grown here. This was compared by cross verifying the extracted crop distributed data from the map with that of the sample taluk survey data. The net percentage error in spatial spread was found to be less than 15% in most of the cases. At the crop level the error varies from 15% to 40%. As the net error is low, the biomass assessment to forecast power potential in the area will be within the acceptable accuracy of about 25%. This has been verified in many cases using the local ground realities. A sample is shown in Table.1 for the district Belgaum and Taluk ‘Saundatti’ of Karnataka state in India. In this particular case the scalar error after distribution is within about 6%. Such a crop distribution is generated for the whole country. To improve the accuracy keeping the cost in mind the NDVI, Agricultural vector map processed out of 184 x 184mtr per pixel resolution camera is used to spatially correlate non-linearly adopting ANN-BP algorithm explained in the next section. A sample for Saundatti Taluk in Belgaum is shown as under in Fig.1.


Fig. 1. Crop distribution in Saundatti Taluk of Belgaum district in Karnataka state of India.


Table.1 Taluk level extracted crop data from map after crop distribution at district level.


5.1.2 ANN classifier
The ANN-BP algorithm is developed specially for the purpose of crop distribution on VBdotNET platform. The inputs are NDVI and Rainfall as crop parameters. The hidden layers are chosen with multiple outputs depending on the number of major crops. The tool is under test. The crop distribution with a simple network of 3 layers multiple nodes with bias have shown considerable re-distribution over the simple Fast Vector classifier. The ANN-BP model used is shown in Fig.3. The model uses a max-min normalization to limit the input crop parameters by fixing the over all range of rainfall and NDVI for the crops under the region. The hidden layers and the outputs are squashed using sigmoid function. The weights are randomized initially. The back propagation is done by using the first partial derivative of the continuous squashing sigmoid function. The computational equations used for the ANN classifier for both FF and BP are listed below from 2 to 9 and the model is shown in Fig.3.




Fig.3 ANN-BP model.


Regional Training sets are required to train the neural network using back propagation. This is prepared carefully by using known village level ground data for crops grown. This is then linked to the respective polygon vectors of the region. It is also graphically visualized by a special tool developed to over lay the NDVI layer with transparency on the Agricultural layer. Similarly Rainfall layer has been generated using intersection of isohyets [isopleths for rainfall] with taluk borders and land use. The following table shows a training set excerpt [Table.2] prepared based on the ground truth and graphical study with overlays of NDVI and Rainfall layers done on the Taluka ‘Saundatti’ in the district of Belgaum in the state of Karnataka.

Table.2 Training set.


Map is queried to extract the data for Agricultural polygons containing region names, NDVI and rainfall attributes. Such a table containing unknown pattern looks similar to training set [Table.2] except that the crop classification will be absent. This will be used as unknown input pattern during FF crop classification. The pipe line process for spatial crop distribution is as follows:
  1. Following is the table for the district ‘Belgaum’ in Karnataka. The Table.3 is formed by listing all major crops using a cut-off of 3kHa per crop in the district. The polygons will have to be classified to one of these crops based on one of the classifier methods.
  2. Depending on the number of major crops [crop area in the district > 3kHa] the output nodes are chosen in the ANN. The numbers of hidden layers are chosen to be equal to number of influencing parameters plus the number of output nodes. The effect of number of hidden layers is also to be studied. Use of more number of crop parameters enhances the accuracy further. Due to the availability of spatial NDVI data along with Isohyets initially these two parameters are used for the study. The ANN-BP is modeled to easily adapt to any number of inputs, correspondingly the hidden and out put layers.


  3. Table.3 District level statistics of major crops [crop area > 3kHa].


  4. Make initial random weights and repeat. Compute the weights using the ANN-BP. Observe the stability of weights for each set. The network error is monitored by computing the mean squared error at the output. If the error gets reduced below a pre-defined threshold value the BP is stopped and weight is updated to move on to the next known sample in the training set. If the stable end weights are observed proceed for updates.
  5. The weights are updated into a weight look up table after each sample during learning for each training set. The weights are used to now classify the polygons for unknown crops in the data base. The network with known number of major crops in the region can now classify unknown polygons at the multiple nodes assigned for each known crop during learning. The ANN is used in Feed-forward mode to classify the polygons into crop using a test set. As the unknown sample patterns are fed to the input of the network the winner at the output will be chosen.
5.2 Crop Distribution
Using one of the above methods crops are spatially distributed in GIS platform by linking the polygon identifier. For e.g. the table.4 shows one such classification using the Fast Vector classifier. Then polygon is colored using pre defined crop-color table. The process is repeated for other regions. A sample clipping of crop distributed and colored map for Saundatti Taluk of Belgaum district in Karnataka state is shown in Fig.1. Such a crop distribution is used through a special GIS application for dynamic graphical queries for biomass assessment under the region of interest.

5.3 Biomass assessment, Results & Discussions
Biomass assessment after polygon vector classification into type of crop is done by using distributed crop area, the crop yield and the crop residue ratio. In cases where the crop is either not defined or has no bearing on the generation of residue, the residue yield is used to compute the biomass. CRR, Crop yield and Residue yield are arrived at after a survey by taking field level samples of standing crops. CRR is the average of ratio of residue yield to crop yield under the sample areas having no unit. Crop yield is the crop production in unit area usually in Tons per Hectare. Residue yield is the residue generation in unit area usually in Tons per Hectare. Then the surplus biomass is assessed by subtracting the societal usages. The surplus biomass multiplied by a factor for power computes to biomass power generation. This is the basis on which the biomass assessment has a direct bearing on the crop area- a spatial parameter. The following Table.4 shows such an assessment made on a region of interest in the taluk ‘Saundatti’ of district Belgaum, Karnataka, India queried on a country wide biomass atlas generated using the classifier explained in section 5.1.1.

Table.4 Taluk level biomass assessment on the Saundatti taluk.


The Back propagation was run to compute weights with different randomized initial weights. This is presently done for a set of known crop polygons as given in the Table.2. The learning rate in this sample case was 0.08 with 2072 iterations for the sample of crop Paddy. After few such sets are generated they have to be used to run the network for weight updates and tried on unknown sample sets. The results will again be verified with ground truths randomly for few selected places. Using a small set and the updated weights the crop distribution was tried for the Taluka ‘Saundatti’. The scalar distribution improved substantially to greater than 90% of accuracy with a considerable spatial re-distribution compared to Fast Vector Classifier. It has to be now studied with ground truths for proper checks to be made by repeating it with more training sets. This will enable to extract data at taluk level to higher accuracies after district level crop distribution resulting in better assessment of biomass to compute the power potential which helps the entrepreneurs to plan up the project site selection, implementation and plant running efficiency. The effect of this is lower cost of energy delivery to the users.

During classification of unknown agricultural geo-areas, outliers are also possible due to random nature of parametric data related to crop growth. It is also possible to reclassify them successfully by using fuzzy linguistic rules based on societal factors available. For e.g. a crop may be known to be grown in a particular region with a higher probability or may be plants of long life such as Coconut and coffee. Such classifications will have to be studied again with more detailed ground truth verifications. It is also possible that due to aliasing of parameter ranges at the thresholds of learning the network gets ‘biased’. By varying number of hidden layers with different initial weights, learning rate, bias and thresholds the net can be retrained to see whether the outliers can be resolved.

This has to be further studied with more samples and sets. As the data to be processed for learning is very high an optimized learning rate has to be arrived at before the BP is done to train for other regions in the country. As the relation between geographical crop correspondence and the parameters are non-linear in nature the BP algorithm making use of continuous squashing is expected to behave either randomly or with instability. The output and the error for the single node chosen are plotted to see the iterative progress. The outputs are recorded with different learning rates and initial weights for each training set. Then they are plotted with respect to the learning rate which will help for further study of chaos to optimize on learning rates. The bias and thresholds will also have to be adjusted to overcome saturation during crop learning process.

6 Concluding Remarks
The objective of this study is to assess the Biomass across the country [India] to forecast the Power potential under specific geographical areas with an accuracy of 75% or more at the Taluka level [a smaller region in each district]. The simple vector classifier performed well providing the required accuracy of more than 75% especially for those crops being grown over a geographical area of more than 3kHa in each district. Better survey assessments of Crop influencing parameters such as agricultural habits and the image classification into agricultural areas can improve the accuracy of district level distribution to be resolved at Taluk level. The ANN classification using other spatial influencing crop parameters- NDVI and Rainfall can be adopted after detailed studies are made on more regions by studying the biomass assessment out comes in the known geographical regions. These classification methods can also be extended to non-crop biomass growing regions by suitably using the influencing parameters applicable in both the classifier methods.

Bibliography
  1. Kishan Mehrotra, Chilukuri.K.Mohan (1997), “Elements of Artificial Neural Networks”, Penram International Publishing Ltd.
  2. Keith.C.Clarke, Bradley.O.Parks, Michael.P.Cranes (2002), “Geography Information Systems & Environmental Modeling”, PHI.
  3. Michael Otey (2005), “Microsoft SQL Server 2005 New Features”, McGraw-Hill/Osborne.
  4. Jon Flanders, Ian Griffiths, Chris Sells (2003). “Mastering Visual Studio .NET”, O’Reilly Publication.
  5. Team from CGPL, IISc, Bangalore (2003). “Biomass to Energy the Science and Techmology of the IISc Bioenergy system”.
Web References
  1. http://www.geoconcept.com
  2. http://en.wikipedia.org
  3. http://www.gisdevelopment.net/policy/gii/gii0011.htm
  4. http://cgpl.iisc.ernet.in