Overview |
Urban Sprawl |
Fringe Area Development |
Urban Agglomeration |
Emerging Technologies |
Relevant Links
Automated interpretation of the spatial distribution of socio-economic conditions
Focus
There are so many frontiers which GIS investigators are offered as representing important problems. Many have solutions with immediate and substantial financial reward. Substantial financial reward – but perhaps not immediate – will also be attracted by the solution of the other problems, even those with a large theoretical component. Financial reward is a strong motivator and another motivator is the satisfaction of solving a complex problem. It would be difficult to argue that the problems of developing and applying GIS are not complex.
GIS means many different things to many different investigators. To me it means all aspects of the computational manipulation of population and housing census data. This will be the focus of this paper. And, given this focus I will discuss what I consider to be a major theoretical research frontier of GIS and I will describe what I am doing to solve the problem it represents. I hope that this paper will stimulate others to contribute to the solution of this problem.
The frontier is the automated interpretation of the spatial distribution of socio-economic conditions through the manipulation of population and housing census data. I am solving the problem that this frontier represents by examining surfaces that characterise the spatial distribution of a socio-economic condition described by the census data.
Problem Solving Environment
General purpose vs specific purpose
There are two ways of approaching an investigation of spatial distributions of socio-economic conditions through the manipulation of population and housing census data.
First, an investigator may wish to find what is commonly referred to as the “general structure” in the data. This implies looking for broad patterns and more likely than not broad patterns of interrelationships among a relatively large number of variables. General purpose studies were popular in the 60s and 70s with the major tools being principal components analysis and cluster analysis. The search for general structure has experienced renewed interest with recent experiments in data mining and, in particular, the use of a variety of artificial neural network techniques. All this work can be justified on academic grounds but it is not too easy to justify it in the pragmatic problem solving environment of many governmental agencies and most private sector entities. In these environments specific-purpose investigations are most common.
Therefore, the second approach typically involves the examination of individual census data variables each of which is considered relevant to the solution of a tightly defined problem. An “aged care provision” problem may be solved by examining the spatial distribution of persons 65 years old and over; “teaching English as a second language” – persons born in non-English speaking countries; “marketing first home loan products” – married couples with one dependant child; and, “selling income protection insurance” – self-employed persons. In other words given a tightly defined problem the investigator chooses one or several census variables to study in detail. And the investigator also chooses the geographical region, such as a metropolitan area, and the level of generalisation, such as the street block. The geographical choices, just as with the choices of census variables, are a reflection of a tightly defined problem. It is in this problem solving environment that the research discussed in this paper is considered most relevant.
Analysis and mapping
One might think that given a tightly defined problem the selection of the census variables and choices related to geography would constitute the most difficult aspects of problem solving. After all there is a dazzling array of software products to allow the investigator to effect all manner of data and statistical analyses; many of these products have mapping capabilities. As well there are many other products that offer mapping as their primary feature including a wide variety of ancillary data and statistical analysis capabilities. But having software and knowing how to effect high quality analyses and mapping is not the same. High quality analysis implies years of professional training at the postgraduate level and years of on-the-job experience. I argue that obtaining training in the use of mapping procedures is even more difficult. There are relatively few professional training programs and many of the existing programs are strongly oriented to particular proprietary mapping software products. The number of professionals with extensive on-the-job experience in mapping census data is very small. In summary, of the hundreds of thousands of users of population and housing census data few have professional training in the use of the basic tools of interpretation and even fewer have much on-the-job experience. So what do most of the users – the professional investigators - do?
A cursory examination of a small number of reports purporting to interpret the spatial distribution of a socio-economic condition reveals a common treatment. A typical report consists of a brief description linking the spatial distribution to the purpose of the study. This prose is accompanied by several tables of census data and a few color choropleth maps. It is unlikely to find an explanation of the processes that account for the spatial distribution. Also, it is unlikely to find the results of anything but the most rudimentary of data and statistical analyses, and these are often of questionable relevance to the purpose of the study. Maps appear to be included for their cosmetic value barely supporting the arguments for the spatial distribution of the socio-economic condition being investigated.
This is what I call “the substantiation approach”. The user intuits the solution and then finds tables of census data and then makes choropleth maps using these data in order to support, ie substantiate, the intuition. The reports are used for making decisions, so if the intuition is good then there is the possibility of a good decision. If the intuition is bad, well …?
The bridge between data and decision is intuitive or qualitative and, therefore, weak. The research reported in this paper constitutes the beginning of an attempt to change this. It is an attempt to build a quantitative – an objective - method for the automated interpretation of the spatial distribution of socio-economic conditions through the manipulation of population and housing census data. I assert that this is a major theoretical research frontier of GIS. One motivation for solving the complex problem it represents is satisfaction, and the second is the substantial financial reward that the solution is likely to attract.