GISdevelopment.net ---> GITA 1999 ---> Data Distribution and Access

Data visualization: Adding spatial components to data

Mark Harley
Principal Engineer
Geographic Data Technology, Inc.
11 Lafayette St., Lebanon, NH 03766-1445


Data Visualization

What is Spatial Data Visualization?

In the context of this paper, Spatial Data Visualization is drawing a picture to represent information, often in the form of a map. It could be printed on paper, or displayed on a screen. The goal is to furnish data in a way that easily conveys information. Often, a desired message is difficult to see when presented in the form of a report or database table. When a spatial, or mapping component is added, the old cliché that “a picture is worth a thousand words” comes to life. Adding a mapping component is known as Spatial Data Integration: “Combining spatial and often, tabular data from different sources into a usable format.” Usable is emphasized because there are numerous ways to combine and process data, though only some methods produce beneficial results.

Why Spatially Enable?
Think of all of the legacy database systems in use today, and the typical deluge of reports generated for corporate management and other “decision support” functions. Is there a better way to view and analyze some of the information? Often, the answer is “yes.” When the data contains positional information, a map may be used to show the relative locations of items and assist with making make sensible decisions.

At a major database conference, the following statement was made: “Approximately 85% of the data used in commerce, industry and government has a spatial component. Usually, it is unused.” This represents a tremendous potential for the addition of spatial analysis.

Some examples of “location” information that might be found in existing tabular data include:
  • A full address
  • A ZIP code
  • City and/or state
  • A particular dealer or store where a sale was made or service performed
  • Telephone area-code and exchange
  • Job address
  • The club chapter where a membership is held
  • Social security number (first three digits are taken from the ZIP code where applied for)
Some types of problems that might be well suited to using data visualization techniques and spatial analysis include:

Where Do MV Customers Live?
  • Targeting a direct mail campaign
  • Opening a new store
  • Near a solid customer base
  • Avoid cannibalizing sales from existing stores
  • Select locations relative to competitors
  • Ask how far most customers actually travel to visit a business
  • Find the location of active or potential policyholders
  • Relative to a natural disaster which has taken place
  • Relative to a potential emergency such as a forest fire or flood plain
Routing or Drive Time
  • Delivery of items such as pizza, furniture, flowers or express packages
  • Commuting distance when writing an auto insurance policy
  • Distance to the nearest fire department when writing home or business insurance
  • Driving directions
  • To bring a client to your business
  • From the airport to a hotel in an unfamiliar city to assist a car rental customer
  • Trip routing for a vacation
Territory / Polygonal Calculations
  • Identify which rating territory a new insurance client is located in
  • Describe dealer service areas
  • Direct a consumer to the nearest repair depot
  • Identify field service office responsible for responding to a call
  • 911 emergency calls
  • Identify which police, fire or ambulance covers a specific incident location
Buffer Zones
  • Flood risk classifications
  • Buffer around a river or below a dam
  • Evacuation area buffer around an active incident
  • Chemical spill
  • Forest fire
  • Rising water
  • Hostage situation
  • Installation of a new gas main
  • To identify and notify abutters
Straight-line Distance or Radius Search
  • Distance from an earthquake epicenter
  • Proximity to an airport
  • For enforcing noise regulations
  • Distance from a nuclear power plant
  • Define audible warning area and planned evacuation zone
  • Distance to nearest hydrant
  • To establish insurance rates
Spatial Enabling Methods
There are many possible ways to spatially enable data so it can be visualized geographically. A few examples are provided here.

Combining Existing Geography with Tabular Attributes
Existing geography can be combined with tabular attributes. If there is a table containing statistics, gathered by a physical area, the attributes can be joined to a geographic representation of that area. For example, a tabular database might contain some demographics by ZIP code. Perhaps there are average figures for household income, age, and the number of dependents. A corporation would like to locate a new store near a relatively large number of higher-income households with few dependents. It is not possible to create a map directly from the demographic table to visualize potential store locations.

To solve this, it is possible to start with some existing map data representing ZIP code boundaries. By joining the demographic table to the ZIP boundary polygons, using the ZIP code as a join field, the demographic data can be added to the geography. Then, using mapping software, thematic map layers can be created showing both income levels and numbers of dependents using different colors or shading. Looking at the map, it will be more obvious either where to locate the store, or why it is not easy to place it using the first set of demographics. In the latter case, additional maps may be helpful, built using different statistics. As with traditional decision support systems, sometimes a conclusion is reached from a particular map presentation. Other times, new questions are generated, and the process repeats with requests for additional maps and reports. In this example, for instance, another logical inquiry would be to show where commercially zoned property was located, and which properties are for sale.

When joining tabular data to mapping polygons, be aware of some potential problems. Make sure the join field is defined in the same way on each side, or develop a scheme to cope with the differences. For example, one data set may define a ZIP code as a numeric field with no leading zeros. The other may define it as a character field, with leading zeros, 6 characters wide and a leading or trailing space. Data currency can be an issue as well. In this example using ZIP codes, be aware that the Postal Service makes frequent changes to ZIPS. New codes are often created in high growth areas. Stable regions may have a ZIP code eliminated as a cost-cutting measure. It is possible that some values may not join because they don’t exist in both the geography and demographic data. Expect problems like this when joining on fields that are believed to be similar. Make a plan for handling the exceptions.

Using Existing Geography as Building Blocks
Existing map data can be used to generate new geographic entities. For example, sales territory polygons could be built by aggregating state or county boundary data. The best method for doing this will depend on which mapping software is used, the complexity involved in describing each territory and how large the relevant databases are.

Another example would be to construct a travel route using an existing database of street geography. Individual street and/or highway segments could be selected from the database to form a complete itinerary. Once created, the route could be analyzed to calculate drive time, distance or average speed. A given route could be compared with other possible paths to select the best plan for a particular situation. Again, the preferred method will depend on the software used. Some mapping packages include routing features to assist with finding an optimal path between two points, selected based on parameters such as the shortest distance, shortest travel time or most/least use of highways.

A third example is to generate a list of households to be included in an advertising mailing. Perhaps, due to postal regulations and bulk discounts, the mailing is to be done to all households within a set of carrier routes. It would be necessary to identify which postal carrier routes are to be involved in the mailing. Starting with a base map of carrier route polygons, the polygons can be intersected spatially with the desired area for the total mailing. Then, a carrier route list can be generated for the relevant Post Office(s) involved. A household count per carrier route can be determined, and the mailing pieces can be printed and delivered to the Post Office.

Format Conversions
Proper data visualization often requires having several layers of information spatially enabled and ready to draw on the same physical map. Sometimes, this involves converting data between different mapping packages. If a project was to be undertaken using vendor-A’s mapping software and an important data layer was only available in vendor-B’s format, conversion from B to A might be necessary. Depending on which software is involved, there might be a simple off-the-shelf solution. It could also be difficult, even to the point of needing to write a custom program or hiring a consultant to sort out the issues and arrive at a reasonable answer. In some situations, the mapping software might be able to read and draw the all of the layers for a given project, even if they are not all the same format. However, the layers might not line up relative to one another. The coordinate reference system, scale, datum or geographic projection might differ and need to be adjusted in one or more layers. Full understanding of this issue is beyond the scope of this paper, but suffice it to say that it can be a serious problem, and difficult to resolve.

The Role of Geocoding
Often, address information is included with records found in existing tabular databases. Many queries can be written against this data, but direct spatial questions cannot be asked. The process of Geocoding means to find the physical location of one or more addresses. That is, to locate the latitude and longitude points corresponding to those addresses. Often, Geocoding is a process applied to an address list, thereby “spatially enabling” it.

Once an address list is Geocoded, the corresponding points can be added to a map display. Then, spatial calculations can be performed. To see where customers actually live, and how far they drive to visit a particular store, they must be physically located. To offer flood insurance to every residence in a certain area near a river, the households that are actually in that area must be identified. To know if a caller ordering a pizza is within an 8-minute drive time from a particular fast-food restaurant, their location must be known, and a route analysis from the store performed. Geocoding software is available from many Geographic Information Systems (known as “GIS”) software providers. It can be packaged with mapping software, or provided as a stand-alone tool. In addition, some companies provide Geocoding services where an address list is sent to them and is later returned with latitude/ longitude points, and often census codes, appended. With the growth of the Internet, Geocoding services are being offered on-line as well. Single addresses may be Geocoded interactively. Full lists are usually run in a batch operation.

Geocoding seems easy at first, but is actually a complex process. Numerous factors contribute to the failure to Geocode many entries in what may seem like a reasonable looking address list. The quality of the input data is vital. Incomplete address fields, misspellings, “vanity” addresses, non-postal community names and the use of obsolete ZIP codes can cause Geocode failures or mismatches. Also, the quality of the master geographic database used by the Geocoder plays a significant role. If streets are missing or ZIP codes are out of date, perfectly valid addresses may fail to match.

Geocoding is very much of a probabilistic science, usually based on how well incoming addresses can be parsed and how well they match the addresses in a master Geocoding database. Even parsing a properly written address can be challenging. Consider the example of “#1 10E East “E” Street, Suite 102E.” This could be expressed in a mailing list as”1 10 E E E ST STE 102E.” How many parsers will correctly identify the street name in that example? Or, consider an ordinary looking address such as”112 Main St” in a certain city. The geocoder may fail if the city only has a”1 12 N Main St” and a”112 S Main St” and no simple”112 Main St” without the prefix directional.

In practice, locating 100% of the addresses in a list is nearly impossible without human intervention and research to resolve those addresses that fail the algorithmic approach.

The importance of data quality  
When spatially enabling and visualizing data, the use of high quality geographic layers can be critical. Consumers know their local area and will be demanding if they find errors in a map. They are likely to test a new service based on places they are familiar with, including where they live, work, where they grew up or where family members presently live. If insurance territories are established based on existing computer-based maps, there may be liability, including fines, if an insured property is placed in an incorrect rating zone.

Accurate Geocoding is essential to show customers in their proper locations. Geocoded stores, dealerships or repair shops could be displayed as part of a base map. If some points are shown in the wrong place, or are missing entirely, the map quality maybe too low for some applications. It could be a requirement to perform manual Geocoding follow-up work to locate 100% of the addresses. In some situations, such as showing”~ certified dealer locations,” a local competitor may be unfairly favored if clients are directed to one store, but another is missing from a map. Another Geocoding problem arises when data is Geocoded relative to a different base map than what is displayed. This can occur when a different vendor’s stand-alone Geocoding tool or Geocoding service is used. If the geography does not match, addresses can jump to the wrong side of a street, appear to be on a different street, or worse.

All data layers must be properly integrated; consistency is crucial. If some data layers differ in their coordinate reference system, scale, datum or geographic projection an adjustment or conversion may be necessary. This is often the case when data from multiple sources is integrated.

Summary
There is a wealth of existing tabular data that, if spatially enabled and visualized on a map, can be presented and analyzed using methods previously untapped. The power of geographic tools allows spatial relationships to be calculated, including customer locations, routing, territory analysis, distance from a given point and locating properties in a buffer zone. Geocoding software is powerful and complex, yet a critical part of many data visualization processes. The use of high quality, consistent data layers and processing techniques is essential in the production of a usable set of geography.

Properly executed, even a simple data visualization project can reap tremendous benefits as familiar information is clearly and concisely presented in a new and exciting way.
© GISdevelopment.net. All rights reserved.