Page 2 of 3
Previous | Next

Sociodemographic Spatial Change in UK: Data and Computational Issues and Solutions


In terms of continuity between censuses “there is an inherent tension in the decisions over introducing new topics and dropping old ones, reflecting changing needs for information whilst retaining comparability with previous censuses” (Marsh 1993: 7). Thus, the first check to make is whether the topic and questions of interest have been asked in successive censuses. Fortunately, much painstaking work has already been carried out and the researcher should spend time absorbing the wealth of information contained in Norris and Mounsey (1983), Dale (1993) and Champion (1995).

Where the topic exists across time but the detail varies, a researcher might be faced with the following choices: to aggregate differently detailed information to broader groupings that are in common; or to estimate a disaggregation of grouped information to the detail required. The disadvantage of the former is that detail which may have been of interest may be lost, the disadvantage of the latter is that the estimation may be unreliable. Where a topic was not included or information disseminated differently, an estimation using a surrogate variable may be possible. Some examples are outlined below.

It would be reasonable to assume that population counts are consistent from one time point to the next. This will not necessarily be the case since the census population definitional ‘base’ may vary. Dale (1993) identifies two basic methods of enumeration: first, to count everyone present in the household on census night, irrespective of where they usually live; and second, to count everyone who is usually resident in the household, irrespective of whether they are present or absent on census night. Before using population counts as denominators in rates or to calculate change in population size, check that the bases are consistent. Check also whether tables are populations in households or communal establishments and whether students are enumerated at their term-time or parental addresses.

In terms of population characteristics, check that categories are consistent. For instance, a question about each person’s ethnicity was included in the 1991 Census with the topic repeated in 2001. In 1991 the ethnic group information was released as 10 main output categories but in 2001 the ethnic group categories were changed resulting in a total of 16 categories. To align the 1991 and 2001 ethnic groups, a solution is to amalgamate categories to those thought to be compatible (Simpson 2002b). The 8 categories which result (table 1) can be used to investigate changes in ethnic group proportions during the inter-censal period.

Table 1: Eight ethnic group categories compatible in both 1991 and 2001


For a longitudinal study 1971-1991, Norman et al. (2005) needed to calculate Carstairs deprivation scores for 1971 but an input variable to this index, Low Social Class, was unavailable for that year. ONS (2004) provide tables linking various socioeconomic classifications. This enabled 1971 Census data on various socioeconomic groups to be approximated as Social Class IV and V using an interim variable NS-SEC Operational Categories (see Table 2).

Table 2: Creating a ‘Low Social Class’ variable for 1971


The demographic detail in both census and VS data can vary between time points. Census data, for example, tends to be banded into age-groups for confidentiality reasons but this also reduces file sizes that accrue with single year of age information which can prove overwhelming to inexperienced researchers. VS data are also released with age information grouped. Several difficulties with census and VS age information can arise. Age bandings can vary between time points and the detail for which age is released may not be the detail required for a study. Since more demographic detail is released at national and regional levels in census tables than for data released for district and sub-district geographic scales, age bandings can vary between tables from the same census.

A common age banding is for ‘quinary’ 5 year groups. Whilst much sociodemographic data area released for these groupings some variations occur. The oldest age-group has for many years been for those persons aged 85 and over (often labelled 85+). Reflecting increasing life expectancy and ageing populations, more recently quinary data have been released up to ages 85-89 and 90+. The youngest ages often have ages 0 and 1-4 rather than just 0-4 and occasionally there are splits in the late teenage years to allow rates to be appropriate to school age and young adult applications.

To harmonise age bandings, groups can be aggregated to the detail in common between sources. The VS2 has information about the age of mother when she gives birth to a child. Table 3 shows that prior to 2000, the age breakdown in late teenage years is different to that released for 2000 onwards. A solution in this instance is to aggregate to a 16-19 age-group. Alternatively, to match datasets or to have a different grouping from the available information, a disaggregation may be required. Hierarchically applying more detailed information available for a large geographical area to a sub-geography is a strategy worth adopting. For example, if mortality by single year of age were needed for a study at ward level, the grouped information in the VS4 can be disaggregated using national schedules for within group proportions.

Table 3: Harmonising age detail: aggregation of age of mother at the birth of a child


The examples given here have flagged difficulties which may be encountered and suggested approaches for harmonising a time-series of attribute data. In the next section of the paper, harmonisation of the geography for which data are released will be considered.

Page 2 of 3
Previous | Next