|
|
|
Data Processing, Algorithm and Modelling
|
Collaborating Remote Sensing with historical limnological data to map primary productivity at a Eutrophic lake
Study Site and Limnological Data
The study site is the lake Kasumigaura (fig. 1), the second largest lake with an estimated area of 220
km 2 and with a total catchment area of 1969 km 2. Originally a brackish type, it became a freshwater lake
in 1976 after construction of a regulatory dyke at its only outlet preventing seawater from coming in.
With an average depth of 4m and a maximum depth of 7m, it is an eutrophic lake throughout the year.
The lake environment is crucial to the environment and the population of 950,000 residing around its
banks in about 45 cities. Due to this very fact and the degrading water quality have attracted many
national and international researchers to focus on analyzing the water quality problems and processes
of lake Kasumigaura. Lake Kasumigaura Research Station of National Institute of Environmental
Studies (NIES), Japan along with several other government as well as private organizations have been
collecting data for supporting various researches on water quality of the lake to propose remedial
measures for betterment of the solution. Lake Kasumigaura Databook (2001) published by Center for
Global Environmental Research (CGER), NIES compiles limnological data for several important
variables for 10 locations spread over the lake and taken at monthly intervals during the period 1977 ~
From the dataset, the mean chlorophyll concentration at Kasumigaura was 69þg/l (Max:199 ? g/l)
in yr.2000 with a typical yearly sediment load of 67-91 t/ km 2 from the 56 rivers feeding the same.
Aquatic Primary Production and Modeling Requirements
Aquatic primary production can be considered as the mass of carbon fixed as newly grown organic
material in the water column. Thus it is the sum of all the photosynthetic rates within the aquatic
ecosystem and often a synonym for aquatic biomass. Phytoplanktons provide the material basis for the
pelagic ecosystem through primary production. However, excess primary production sometimes
deteriorates water quality due to dense blooms of nuisance algae, causing serious damage to the fish
production, recreation, flora and fauna as well as to the human health due to a variety of resultant
substances such as CO2, H2S, CH4, corrosive gases, and toxins (Falconer, 1993). Lake Kasumigaura is
facing similar problems, and thus it is necessary to understand the chemical, biological and hydrological
bases of controls on primary productivity in order to accurately predict the effects of inputs (nutrients
from catchment) and to develop management strategies that can protect ecosystem functioning.
Empirical modeling of phytoplankton primary production (PP) has always been based on predictive
variables that are more easily available and cheaper to measure than primary production. These
mathematic models involve aquatic optics and chlorophyll concentration to predict phytoplankton primary
production by analytical means (Cole et al., 1987). However, the analytical methods, with many
assumptions, often can’t represent the underlying complexity typical to primary production. Modeling
primary production has been a challenge and there are limited publications on the same. Neural network
has been found to be a better solution for tackling the inherent non-linearity with some degree of success
(Scardi, 1996, 2000; Scardi et al., 2001). However, the context of ecological modeling is quite different
from that of most neural network applications, as data sets and knowledge are often very limited with
respect to the complexity of the real world processes. Therefore, relationships between variables are
only partly known and understood as they are usually studied by analyzing correlations rather than by
defining causal pathways in a strictly deterministic framework (Scardi et al., 2001). In our research also,
we selected neural networks for developing the primary productivity model for lake Kasumigaura.
Methodology
- Selection of variables
The major factors determining aquatic primary production for any given day is the biomass of
phytoplankton, which itself results from previous production. Photosynthetic variation occurs through
attenuation of photosynthetically active radiation (PAR, 400 ~ 700nm). Therefore, incident solar radiation
and underwater light attenuation are the fundamental factors controlling the productivi ty of shallow lakes.
According to Takamura et al. (1991), primary production in lake Kasumigaura is closely related to water
temperature, solar radiation and chlorophyll-a concentration. Nutrients are found not to be the limiting
factors of productivity. Concluding from the above discussion and a preliminary regression analysis on
various limnological parameters and productivity, four limnological variables, namely, chlorophyll-a (chl.a,
þg/l), suspended sediment (SS, mg/l), secchi disk depth (SDD, cm) and water temperature (WT, °C) are
selected as input variables. Suspended sediment and secchi disk depth are used to simulate the effects
of underwater light intensity. Sediments in fact play a significant role in the process of lake
eutrophication, especially the clay size fractions. The first three optically active variables and WT can be
estimated from broad-band sensors such as LandsatETM+. Broad-band satellite sensors are preferred
to modern aquatic satellite sensors (e.g. MODIS) in inland waters due to their better spatial resolution.
Gross productivity (gm.C.m-2 day -1 ) data for lake Kasumigaura is available monthly from 1977 to 1996 for
10 stations spread over the lake. However, data prior to 1981 were not considered in this study as
productivity was measured using O2 method from 1977 ~ 1980. Use of 13 C has been used since then for
the measurement as radioisotopes such as 14 C is legally limited in Japan.
- The Quickprop neural network
For modeling, a faster variation of back-propagation neural network, namely, Quickprop is used.
Quickprop, developed by Fahlman (1988) is not an adaptive learning technique. However, like any other
back-propagation neural network, the network contains three kinds of typical layers, namely, input, output
and hidden layers. All layers have processing units called nodes. Number of input nodes and output
nodes and fixed and are based on number of variables under consideration. Number of hidden layers
and hidden layer nodes may vary depending on the complexity of the problem. All nodes are
interconnected in a forwardly manner with weighted interconnection. Finding the right combination of
weights in these interconnections, which represents the system under investigation, is the goal of the
network training. For this purpose, first, inputs are fed through the input layer and after passing through a
summation and squashing function, the output becomes the input for the subsequent layer. This process
continues until the final output is available at the output layer. The difference between the final output
and the targeted output is then back-propagated for updating the weights in the interconnection. The
weight update rule in Quickprop differs from conventional back-propagation algorithm and is dominated
by a quadratic term,
where, S(n)? ?E/?w(n) The numerator is the derivative of the error with respect to the weight and
S(n?1)?S(n)/?w(n?1)/ (S(n is a finite difference approximation of the second derivative. Together these
approximate Newton’s method for minimizing a one-dimensional function : f (x)/ ?x? ? f(x)/f(x). To
avoid taking an infinite backward step, or a backward uphill step, a maximum growth factor parameter ?
is introduced. No weight change is allowed to be larger than þ times the previous weight change.
Quickprop has a fixed learning rate that needs to be chosen to suit the problem. Detail discussion on
back-propagation neural network and Quickprop can be found in Fausset (1994) and Reed et al.(1998)
respectively.
- Model development
In our case, input variables are fixed at four selected variables. Output is the Gross Productivity (GP) at
the lake. As a starting step, some trial training runs were made with the entire data set. It was found that,
using all the samples from 1981 to 1996 could not produce a satisfactory model, as the input variables
were limited. Even with a network with two hidden layers, it was difficult to train on the data set
satisfactorily and a coefficient correlation (R2 ) above 0.5 could not be obtained. After examining and
some trial and error on training on the data, it was found that, focusing the modeling on a certain season
gave much better result than using the whole dataset. This is because productivity in lake itself being a
complex process varies considerably with seasons in a typical year. Thus with fewer variables it
becomes difficult to approximate a complex function. The situation is much more difficult when the
approximating model is made independent of both the time and space component. Relaxing the model
towards?the time-component can thus provide
better results. It was found that, due
to skewed data distribution (example
shown in fig. 2) for the input
variables, there was bias in the
outcome of the network. Therefore,
prior to training, variables were
transformed to lower ranges by
applying logarithm or square root,
which produced better fit to both
training and validation data. Thus, a
moving-month model is proposed
where a productivity model for each
month of a year is proposed. To
demonstrate the strategy, a model for
the month of January is shown in
detail with results. January was
selected as water quality maps for
the
four input parameters were available for 19 th January, 2001 which were again generated from
LandsatTM imagery on that day (Baruah, 2002). The strategy in the moving month model is to select
data for two immediate neighboring months for modeling to the month for which the model is being
developed. Thus, for the model of January, we extracted data for the three months, namely, December,
January and February. A total of 80 samples were available for these three months. As the training
samples were limited we used Quickprop with cross-validation by randomly selecting 90% of samples as
training data and the rest as validation data, so as to avoid local minima while training. A single hidden
layer is used. Number of hidden layer nodes was varied to arrive at the optimum number, which gave
least overfitting and best weight configuration. A network trained with Quickprop and 5 hidden layer
nodes (i.e. a 4-5-1 network) was found to give the best weight configuration with a weight growth rate of
1.75 at epochs (iterations) of 22256.
|
|
|
|
|
|
|