Data visualization: Adding spatial components to data
Format Conversions
Proper data visualization often requires having several layers of information spatially enabled
and ready to draw on the same physical map. Sometimes, this involves converting data between
different mapping packages. If a project was to be undertaken using vendor-A’s mapping
software and an important data layer was only available in vendor-B’s format, conversion from
B to A might be necessary. Depending on which software is involved, there might be a simple
off-the-shelf solution. It could also be difficult, even to the point of needing to write a custom
program or hiring a consultant to sort out the issues and arrive at a reasonable answer.
In some situations, the mapping software might be able to read and draw the all of the layers for
a given project, even if they are not all the same format. However, the layers might not line up
relative to one another. The coordinate reference system, scale, datum or geographic projection
might differ and need to be adjusted in one or more layers. Full understanding of this issue is
beyond the scope of this paper, but suffice it to say that it can be a serious problem, and difficult
to resolve.
The Role of Geocoding
Often, address information is included with records found in existing tabular databases. Many
queries can be written against this data, but direct spatial questions cannot be asked. The process
of Geocoding means to find the physical location of one or more addresses. That is, to locate the
latitude and longitude points corresponding to those addresses. Often, Geocoding is a process
applied to an address list, thereby “spatially enabling” it.
Once an address list is Geocoded, the corresponding points can be added to a map display. Then,
spatial calculations can be performed. To see where customers actually live, and how far they
drive to visit a particular store, they must be physically located. To offer flood insurance to
every residence in a certain area near a river, the households that are actually in that area must be
identified. To know if a caller ordering a pizza is within an 8-minute drive time from a particular
fast-food restaurant, their location must be known, and a route analysis from the store performed.
Geocoding software is available from many Geographic Information Systems (known as “GIS”)
software providers. It can be packaged with mapping software, or provided as a stand-alone tool.
In addition, some companies provide Geocoding services where an address list is sent to them
and is later returned with latitude/ longitude points, and often census codes, appended. With the
growth of the Internet, Geocoding services are being offered on-line as well. Single addresses
may be Geocoded interactively. Full lists are usually run in a batch operation.
Geocoding seems easy at first, but is actually a complex process. Numerous factors contribute to
the failure to Geocode many entries in what may seem like a reasonable looking address list. The
quality of the input data is vital. Incomplete address fields, misspellings, “vanity” addresses,
non-postal community names and the use of obsolete ZIP codes can cause Geocode failures or
mismatches. Also, the quality of the master geographic database used by the Geocoder plays a
significant role. If streets are missing or ZIP codes are out of date, perfectly valid addresses may
fail to match.
Geocoding is very much of a probabilistic science, usually based on how well incoming
addresses can be parsed and how well they match the addresses in a master Geocoding database.
Even parsing a properly written address can be challenging. Consider the example of “#1 10E
East “E” Street, Suite 102E.” This could be expressed in a mailing list as”1 10 E E E ST STE
102E.” How many parsers will correctly identify the street name in that example? Or, consider
an ordinary looking address such as”112 Main St” in a certain city. The geocoder may fail if the
city only has a”1 12 N Main St” and a”112 S Main St” and no simple”112 Main St” without the
prefix directional.
In practice, locating 100% of the addresses in a list is nearly impossible without human
intervention and research to resolve those addresses that fail the algorithmic approach.
The importance of data quality
When spatially enabling and visualizing data, the use of high quality geographic layers can be
critical. Consumers know their local area and will be demanding if they find errors in a map.
They are likely to test a new service based on places they are familiar with, including where they
live, work, where they grew up or where family members presently live. If insurance territories
are established based on existing computer-based maps, there may be liability, including fines, if
an insured property is placed in an incorrect rating zone.
Accurate Geocoding is essential to show customers in their proper locations. Geocoded stores,
dealerships or repair shops could be displayed as part of a base map. If some points are shown in
the wrong place, or are missing entirely, the map quality maybe too low for some applications.
It could be a requirement to perform manual Geocoding follow-up work to locate 100% of the
addresses. In some situations, such as showing”~ certified dealer locations,” a local competitor
may be unfairly favored if clients are directed to one store, but another is missing from a map.
Another Geocoding problem arises when data is Geocoded relative to a different base map than
what is displayed. This can occur when a different vendor’s stand-alone Geocoding tool or
Geocoding service is used. If the geography does not match, addresses can jump to the wrong
side of a street, appear to be on a different street, or worse.
All data layers must be properly integrated; consistency is crucial. If some data layers differ in
their coordinate reference system, scale, datum or geographic projection an adjustment or
conversion may be necessary. This is often the case when data from multiple sources is
integrated.
Summary
There is a wealth of existing tabular data that, if spatially enabled and visualized on a map, can
be presented and analyzed using methods previously untapped. The power of geographic tools
allows spatial relationships to be calculated, including customer locations, routing, territory
analysis, distance from a given point and locating properties in a buffer zone. Geocoding
software is powerful and complex, yet a critical part of many data visualization processes. The
use of high quality, consistent data layers and processing techniques is essential in the production
of a usable set of geography.
Properly executed, even a simple data visualization project can reap tremendous benefits as
familiar information is clearly and concisely presented in a new and exciting way.
|