Subpixel Estimation of Impervious Surface Using Regression Tree Model: Accuracy of The Estimation at Different Spatial Scales


Direct estimation of imperviousness at subdivision level using aerial photos required usage of GIS spatial data in addition to the 0.3m orthophotos described above. The GIS vector data layers include: 1) a subdivision map for identification and random selection of subdivision samples, where a total of 115 samples out of the available 3280 subdivisions were selected; 2) planimetric data, used in conjunction with the orthophotos, to expedite digitizing of impervious surface in subdivisions for which the planimetric data was available; and 3) a digital road network to expedite calculation of impervious surface originating from streets. All of the data used in this part are digital spatial data in ArcView shapefiles.


Figure 4: 0.3m orthophoto


Before they could be used, the remote sensing images must go through a few steps of preprocessing. The original 28.5m ETM+ images were co-registered to the 0.3m orthophotos to within 0.5 pixel root mean squared error (RMSE) before being resampled to 30m pixels using the nearest neighbor resampling method. The 0.3m orthophotos themselves were beforehand co-registered to the planimetric data. Co-registration is a process of superposing two or more images guided by ground control points so that equivalent geographic points coincide. Accurate co-registration between the images is important since even a slight mis-registration could result in potentially large differences between actual and predicted values of imperviousness. All of these steps were done using the ERDAS Imagine 8.6 software.

Using the GIS software ArcView® 3.2 equipped with its spatial and image analysis extensions (Spatial Analyst and Image Analyst), the digital numbers (DN) of all six reflective bands of the Landsat ETM+ images (Bands 1-5,7) were then converted to at-satellite reflectance values as described by Landsat Project Science Office (2002). Then the Normalized Difference Vegetation Index (NDVI) values for the images were calculated, followed by the Tasseled-cap values of brightness, greenness and wetness, using at-satellite reflectance-based coefficients described by Huang et al. (2002). The ratio of Band 5:1 was also added as a possible soil moisture indicator helpful in discriminating between concrete and exposed soil. To summarize, the final layers that would be used as independent variable inputs were grid layers of at-satellite reflectance of ETM+ visible bands, NDVI values, Tasseled-cap values and Band 5:1 ratio (Table 2).

Table 2: Independent variables for regression tree method


METHODS AND PROCEDURES

Training and Validation Data
Building the regression tree model to estimate imperviousness per Landsat ETM+ pixel required substantial training and validation data. These training and validation data of impervious surface were the dependent variable of the regression tree model. The main source of the data was the planimetric data that had been updated and verified using the 0.3m orthophotos. All coverages of the impervious surfaces from the planimetric data (buildings, roads, parking, utilities, etc) were merged into one vector dataset. Four 1800m x 1800m windows of the planimetric data were visually selected to cover spectral variations of impervious surfaces and degree of urbanization that best represented the study area. Each of the four 1800m x 1800m windows was then divided into nine equal-sized blocks of 600m x 600m where six of which were randomly selected for use as training blocks and the remaining three as validation blocks. Campbell (1981) and Friedl et al. (2000) suggested that using randomly selected pixel blocks rather than individual pixels as test data should reduce possible bias in model accuracy assessment due to spatial autocorrelation between training and test data. The 1800m x 1800m vector windows were then rasterized into 0.3m pixel in ArcView and reclassified into binary categories of impervious and pervious. Zonal summary of the 0.3m pixels of the impervious category within each 30m pixel of the training and validation areas were then carried out using the Spatial Analyst function of ArcView 3.2 to give the percentage of impervious surface within those 30m pixels.

Regression Tree Analysis
Collection of the values for the independent and dependent variables was first carried out before starting the regression tree analysis. The task was carried out in ArcView with the help of an Avenue Extension called StatMod developed by Garrard of Utah State University (Garrard, 2002). StatMod was used to collect the grid values of both the independent and dependent variables from their respective grid layers. The independent variables were those listed in Table 2 and the dependent variable was the percentage of impervious surface within each 30m pixel of the four 1800m x 1800m training and validation areas. A total of 9600 data points (grid values) per variable were collected for use as training data and 4800 data points per variable as validation data. The data were then exported from ArcView into S-PLUS® to build a maximal regression tree using the values of the dependent and independent variables from the training area (Figure 5). Then, pruning of the maximal tree was carried out to produce a series of simpler trees, each of which a candidate for the optimal tree. In order to select an optimal tree, the quality of each candidate tree was based on its mean absolute error of prediction (using the validation data). In addition to mean absolute error, correlation coefficient was also calculated for each candidate tree. Since several trees were close in their qualities, the tree that used the least number of independent variables, a parsimonious model, was selected. A parsimonious tree model is desirable since it requires less data volume as well as computing time.


Figure:5 Maximal regression tree without the variables (a) and details of a portion of the tree generated using S - PLUS (b).


The selected regression tree model or the optimal model was then used to estimate the imperviousness of all Landsat ETM+ pixels within the study area. This was done in S-PLUS by providing the regression tree model with the pixel values of the relevant independent variables for all pixels within the study area. The resulting imperviousness of each pixel was then exported back into ArcView for visual display. The whole process consumed a lot of computing time and resources since it involved more than 4.3 million pixel values per variable (30m pixel) for a study area of this size, i.e. 860 square miles. This is one reason why a parsimonious model was preferred. In ArcView, the pixel imperviousness was also aggregated at several levels for further analysis. The levels were 2x2 pixel windows (60m x 60m grid), 3x3 pixel windows (90m x 90m grid) and, of course, at subdivision level for the selected subdivisions.

Digitizing Impervious Surface of Subdivisions
Quantification of impervious surface at the subdivision level for later comparison with the predicted values of the regression tree model was done using the manual on-screen (or head-up) digitizing of the 0.3m orthophotos integrated in GIS with the vector data of subdivision boundaries. This method has been successfully carried out and described by various people among whom are Lee (1987), Harvey (1985) and Kienegger (1992). The procedure began with overlaying of the subdivision digital map onto the geo-referenced 0.3m orthophotos in ArcView. From there, impervious surfaces as schematically shown in Figure 6 were digitized from the orthophotos. The process entailed tracing each identifiable feature’s impervious footprint from the orthophotos and summing its total amount according to subdivision. Imperviousness of each subdivision was then calculated which was the percentage of the total subdivision area covered with impervious surface.

In digitizing impervious surface within a subdivision, all area of pavement, sidewalks and nonresidential-lot impervious surface were digitized whereas only samples of driveways and rooftops were digitized. Stratified random sampling was carried out in sampling of lots within a subdivision. This involved separation of lots into two groups, lots served by cul-de-sacs and lots served by through streets before random samples from each group were taken, proportionate to each group’s share of the total lots. Once the lot samples were selected, impervious surfaces from rooftops, lot driveways and right-of-way (r.o.w) driveways adjacent to the selected lots were then digitized. Undeveloped lots were excluded from sample selection and assumed in this study to have the average amount of imperviousness of other lots with similar lot size and location. Altogether there were 13,828 residential lots in all 115 subdivisions and a total of 3,107 lots were sampled for digitizing of impervious surfaces. The total samples thus represent approximately twenty two percents of the total lots. The percentage of samples however differs from subdivision to subdivision depending on homogeneity of lot size within the subdivisions. The range of sample percentages was from five percent for subdivisions with homogeneous lots to as high as thirty percent for subdivisions with variable lot sizes.


Figure 6: Components of impervious surface in residential subdivisions


RESULTS AND DISCUSSIONS

Selection of an optimal model
Table 3 lists accuracy estimates for some promising model options based on different combinations of independent variables. The mean absolute errors for the models range from 7.8 to 8.4% with the correlation coefficients range very closely from 0.69 to 0.71. These results are close to those reported by Yang et al. (2003) when they used the same model to estimate impervious surface. They reported mean absolute errors of 9.2 to 11.4% and slightly higher correlation coefficients of 0.82 to 0.89.
Table 3: Performance of selected models using different combinations of predictive variables


The small differences in accuracy estimates among the models encouraged adoption of a simpler and parsimonious model requiring the least number of independent variables. The relative importance of the independent variables was assessed based on the position of each variable within the rule-sets (the tree) of the model. Within the rule-sets, independent variables are ordered in decreasing relevance to the dependent variable with the most important independent variable positioned at the top of the tree. Figure 5(b) shows portion of the maximal regression tree generated in S-PLUS showing the relative importance of each independent variable in the tree. Inspection of the rule-sets of the models revealed that the most important variables in descending order were NDVI, wetness, B1, B7 and B4. The insignificance of the other variables excluded from the models was not surprising since there were high correlations among the variables as indicated in Table 4. The selected regression tree model was therefore the one developed using only NDVI, wetness, B1, B4 and B7 (Model 4 in Table 3).

Table 4: Correlations among independent variables


Model accuracy across spatial scales

a) Accuracy at pixel scale Validation of pixel imperviousness estimated by regression tree models using Landsat ETM+ images was poor on a pixel-by-pixel basis due to the geometric registration errors between the Landsat images and the orthophotos. Figure 7 shows the plot of predicted versus actual imperviousness on pixel-by-pixel basis. In general, image-to-image registration can rarely be less than half a pixel off in both horizontal and vertical directions. When comparing the subpixel impervious surface from these two sources on a pixel-by-pixel basis, there is less than a quarter of a pixel overlap. This small overlap is the reason why a small mismatch in the registration can lead to large errors in accuracy assessment (Dai and Khorram, 1998).


Figure 7: Predicted versus actual imperviousness per pixel for the actual 30m pixel.


The impacts of mis-registration on validation, however, can be reduced when working on aggregated window basis. Two window sizes were therefore chosen in this study, i.e. 2 pixels by 2 pixels or 2x2 window (60m pixel) and 3 pixels by 3 pixels or 3x3 window (90m pixel). Figure 8 shows the plots of predicted versus actual imperviousness after aggregation at 2x2 and 3x3 window sizes. The impact of mis-registration decreases as window size increases, leading to better agreement between the modeled and the actual impervious surface fractions.

b) Accuracy at local (subdivision) scale
The accuracy of model prediction at the pixel level is important from a scientific perspective and as shown earlier even a slight mis-registration between images could result in large errors. From the management perspective, however, the assessment of imperviousness is more meaningful if done on a landscape management unit such as a watershed or a subdivision. Therefore, the pixel-based imperviousness predicted by the selected regression tree were summarized at subdivision level for the selected 115 subdivisions and compared to the digitized values obtained from the visual interpretation of the 0.3 orthophotos. Summarization of the pixel-based predicted imperviousness was carried out in ArcView only after the water and farm masks had been applied. This eliminated the possibility of misinterpreting water bodies and fallow fields for impervious surface, but the potential of misinterpreting bare soils in non-farm land, however, was still present. Figure 9 shows the plot of model-predicted imperviousness versus digitized imperviousness at subdivision level. The results were encouraging with the mean absolute error decreased to only 4.8% and the correlation coefficient increased to 0.9. There was however still a tendency for the model to overpredict imperviousness at low values. This can be attributed to confusion in Landsat images between bare soils and impervious surface.


Figure 8: Predicted versus actual imperviousness per pixel for (a) 60m pixels (2x2 window) and (b) 90m pixels (3x3 window)



Figure 9: Model-predicted imperviousness versus digitized imperviousness at subdivision level


c) Accuracy at regional scale
Another way to assess the accuracy of the selected model is through visual inspection of predicted imperviousness over the entire study area. Application of the selected regression tree model over the entire study area produced reasonable spatial pattern of impervious surface with some weaknesses that could be overcome to a certain degree. The most obvious weakness was the confusion in interpreting water bodies as impervious surface but this weakness was overcome in this study by implementing water mask to the study area. Water mask can be easily extracted from classification of the remote sensing images. The second and more difficult weakness was the spectral confusion between bare soils (especially from fallow fields) and man-made impervious surface that might have caused the overprediction of low imperviousness. This is however more a weakness of the remote sensing images than the model itself. In this study, this weakness was partly overcome by including a farm mask extracted from the parcel map and assigning zero as the imperviousness value of the area. For urban area, however, there is no available data for such mask and it may or may not be reasonable to anticipate that the extent of bare soils in urban area is relatively minimal.

Figure 10 shows the results of applying the final model over the entire study area with the water and farm masks discussed above incorporated. Visual inspection of the outputs indicates reasonable representation of the pattern of impervious surface within the study area. Major urban centers, the airport, commercial centers and even major transportation routes are well represented with very high imperviousness. High density residential areas are also well differentiated from areas of low residential density surrounding them. These results are good enough for analysis at this level, i.e. a regional level.


Figure 10: Imperviousness level of the whole county.


CONCLUSIONS
The study was about application of remote sensing technology in urban planning works. The objective here was investigate the accuracy of using medium-resolution Landsat ETM+ images in estimating impervious surface aggregated at three spatial scales. Images from Landsat ETM+ were used together with GIS-ready planimetric data updated with high resolution orthophotos for developing a regression tree model to predict imperviousness percentage of each Landsat pixel. Zonal summary of the imperviousness percentage of relevant pixels would give percentage of impervious surface within any spatial zone such as subdivisions, city or even county. It was found that there were several limitations of the model, some of which could be overcome as discussed earlier. However, certain weaknesses seemed to be inherent of the model or the procedures involved in developing the model. One such weakness was the difficulty in co-registering the images used in the model which affected the accuracy of pixel-to-pixel model validation. Nevertheless, this difficulty was overcome by validating the results on aggregated window basis and the resulting prediction error of about 8% was comparable to those reported in past studies.

More useful from management perspective, however, was aggregation of the predicted imperviousness at subdivision level which resulted in higher accuracy when compared to the digitized values. The mean absolute error reported was about 5% but there was still a tendency for the model to overpredict imperviousness at low values due to confusion with bare soils. Although the mean absolute error of 5% is encouraging, the tendency to overestimate low imperviousness can generate biased results. Through visual inspection, the accuracy of the model was acceptable at the regional scale where the model managed to separate areas of high imperviousness from those with low or no impervious surface. Overall, the model has a potential for a quick and synoptic estimate of imperviousness in large areas provided that the areas have no or little bare soil or a procedure is available to eliminate bare soil interference in the model’s prediction. The convenience of using remote sensing images for impervious surface estimation should therefore be taken advantage of. Cautions, however, should be exercised when matching the objectives of the study to the resolution of the remote sensing images used and the issue of spectral confusion between impervious surface and bare soils or other similar natural features still need to be resolved/duly noted.

REFERENCES

  • Arnold, C.L. and C. J. Gibbons. 1996. Impervious surface: The emergence of a key urban environmental indicator. Journal of the American Planning Association 62(2): 243-258.
  • Breiman, L., J. Friedman, R. Olshen and C. Stone. 1984. Classification and Regression Trees. Chapman and Hall, New York. 358pp.
  • Campbell, J. 1981. Spatial correlation effects upon accuracy of supervised classification of land cover. Photogrammetric Engineering & Remote Sensing 47(3):355-63.
  • Civco, D.L. and J.D. Hurd. 1997. Impervious surface sapping for the State of Connecticut. Proceedings of the 1997 ASPRS Annual Conference, Seattle, WA. pp124-135.
  • Dai, X. and S. Khorram. 1998. The effects of image misregistration on accuracy of remotely sensed change detection. IEEE Trans. Geoscience and Remote Sensing 36:1566–1577.
  • Deguchi, C., and S. Sugio. 1994. Estimations for impervious areas by the use of remote sensing imagery. Water Science and Technology 29(1–2):135–144.
  • Flanagan, M., and D.L. Civco. 2001. Subpixel impervious surface mapping. Proceedings of the 2001 ASPRS Annual Convention, 23–27 Apr. 2001, St. Louis, MO.
  • Forster, B.C., 1985. An examination of some problems and solutions in monitoring urban areas from satellite platforms. International Journal of Remote Sensing 6(1):139-151.
  • Friedl, M.A. and C.E. Brodley. 1997. Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment 61:399-409.
  • Friedl, M.A., C. Woodcock, S. Gopal, D. Muchoney, A.H. Strahler and C. Barker-Schaaf. 2000. A note on procedures used for accuracy assessment in land cover maps derived from AVHRR data. International Journal of Remote Sensing 21:1073–1077.
  • Garrard, C. 2002. StatMod. Available online at http://bioweb.usu.edu/gistools/statmod/. Accessed on 11/20/2004.
  • Hansen, M., R. Dubayah and R. DeFries. 1996. Classification trees: an alternative to traditional land cover classifiers. International Journal of Remote Sensing 17(5):1075-81.
  • Harvey, R. B.1985. The Use of orthophotography and GIS technology to conduct a storm drainage utility impervious surface analysis: A case study. Proceedings ASPRS/ACSM Annual Meeting, 10 - 15 Mar 1985. Washington DC. pp271-78.
  • Huang, C. and J.R.G. Townshend. 2003. A stepwise regression tree for nonlinear approximation: Applications to estimating subpixel land cover. International Journal of Remote Sensing 24(1):75–90.
  • Huang, C., B. Wylie, L. Yang, C. Homer and G. Zylstra. 2002. Derivation of a tasseled-cap transformation based on Landsat 7 at-satellite reflectance. International Journal of Remote Sensing 23(8):1741–1748.
  • Ji, M. and J.R. Jensen. 1999. Effectiveness of subpixel analysis in detecting and quantifying urban imperviousness from Landsat thematic mapper imagery. Geocarto International 14(4):33-41.
  • Kienegger, E.H. 1992. Assessment of Wastewater Service Charge by integrating Aerial photography and GIS. Photogrammetric Engineering and Remote Sensing 58(11):1601-1606.
  • Landsat Project Science Office. 2002. Landsat 7 science data user’s handbook. Goddard Space Flight Center. Available online at http://ltpwww.gsfc.nasa.gov/IAS/handbook/ handbook_toc.html. Accessed on 10/28/2004.
  • Lee, K.H. 1987. Determining impervious area for stormwater assessment. Proceedings ASPRS/ACSM Annual Convention, 29 Mar - 3 Apr 1987. Baltimore, MD. pp17-23.
  • Monday, H.M., J.S. Urban, D. Mulawa and C.A. Benkelman. 1994. City of Irvine utilizes high resolution multispectral imagery for NPDES compliance. Photogrammetric Engineering & Remote Sensing 60(4): 411-16.
  • Quinlan, J.R. 1993. C4.5: Programs for machine learning. Morgan Kaufmann Publishers. San Mateo, CA. 302 pp.
  • Ragan, R.M. and T.J. Jackson. 1975. Use of satellite data in urban hydrologic models. Journal of the Hydraulics Division ASCE 101: 1469-75.
  • Rashed, T., J.R. Weeks, D. Roberts, J. Rogan, and R. Powell. 2003. Measuring the physical composition of urban morphology using multiple endmember spectral mixture models. Photogrammetric Engineering & Remote Sensing 69(9):1011-20.
  • Schueler, T.R. 1995. The peculiarities of perviousness. Watershed Protection Techniques 2(1):233-39.
  • Slonecker, E.T., D.B. Jennings and D. Garofalo. 2001. Remote sensing of impervious surfaces: A review. Remote Sensing Reviews 20:227-255.
  • Smith, A.J. 2000. Subpixel estimates of impervious surface cover using Landsat TM imagery. M.A. Scholarly paper. Unpublished. Department of Geography, University of Maryland. College Park, MD.
  • Ward, D., S.R. Phinn and A.T. Murray. 2000. Monitoring growth in rapidly urbanized areas using remotely sensed data. Professional Geographer 52(3):371–86.
  • Williams, D.J., and SB. Norton. 2000. Determining impervious surfaces in satellite imagery using digital orthophotography. Proceedings of the 2000 ASPRS Annual Conference. 22–26 May 2000. Washington, D.C.
  • Wu, C. and A.T. Murray. 2002. Estimating impervious surface distribution by spectral mixture analysis. Remote Sensing of Environment 84:493-505.
  • Yang, L., C. Huang, C.G. Homer, B.K. Wylie and Michael J. Coan. 2003. An approach for mapping large-area impervious surfaces: synergistic use of Landsat-7 ETM+ and high spatial resolution imagery. Canadian Journal of Remote Sensing 29(2):230–240.
Page 3 of 3
| Previous |