GISdevelopment.net ---> GITA 2002 ---> Data Development & Evolution-Providing Data to the Masses

Maps are dead. Long live maps


James C. Fass
Apex Geospatial Data Services, LLC
400 N. Loop 1604 East, Suite 300
San Antonio, TX 78232
Email: jfass@apexinc.com


Abstract
The evolution of maps and the relationship between the map maker and the map user have taken a radical turn since computer technology was introduced to this ancient and intuitive craft. Recent advances in the design of geospatial databases are the logical extension of a fast-moving data-centric trend that can be plotted back to the 1970™s, and, arguably, from the very beginnings of civilization. Yet, although the data-centric target is readily understood and embraced by the data conversion industry, there is a substantial lag in the understanding of data-centric methods to fill today™s targets. Understanding map content in terms of the relational data model it has become requires conversion project managers to think differently about how to plan, manage and control the conversion effort, which continues to be the most expensive component of any GIS.

Of Maps and Meaning

A Brief History of Maps
Map making is an intuitive, apparently universal human activity, which predates in some cases, the use of written language itself. Arguably, the earliest maps known to exist can be found on Babylonian clay tablets that may be anywhere from four to six thousand years old. On such tablets, we find a system of somewhat normalized symbols representing a pattern of buildings and/or plats of land in relative position to a river or shoreline. There is no sign of text or annotation. The meaning of the maps is carried solely by the intuitive interpretation of the repeated symbols and their (presumably not-to-scale) spatial relationships.

Skipping ahead about a millennium to 1300 BC, we can find the earliest known cadastral maps, drawn on papyrus in ancient Egypt. A famous example is known as the ihTurin PapyrusllŠnot because it was made in Italy, but because the artifact is currently housed in a museum there. The Turin Papyrus is notable for its range of cartographic symbology, some of which is surprisingly similar to topographic symbols still in use today. There are, for example, standard shapes for wells, monuments and non-landmark buildings, as well as area patterns to indicate different road surface conditions. The Turin Papyrus also gives us some very early examples of map annotation. These include descriptive text for hypsographic features and road destinations.

Compared with the Babylonian clay tablets, the Turin Papyrus has significantly richer information content due to its expanded vocabulary of symbols and the use of selective annotation.

If we take a brief stop about 1500 years later, still in Egypt, we find some significant advances in map-making with the work of Ptolemy in the second century AD. Maps from the Ptolemic tradition are notable for their attempt to put things into geographic context using a global system of latitude and longitude. In addition to the increased sophistication of cartographic symbology and annotation, we have added meaning in the form of absolute spatial coordinates.


Figure 1 - Mapping Innovation Timeline (1)

For the next 1800 years or so, there are many technological refinements in the standardization of map symbols, the accuracy of feature coordinates and the quality of supporting annotation, but the fundamental nature of how maps convey their meaning is unaffected until computers come on the scene. Then, from the 1970™s onward, we see a series of developments no less significant than those surveyed for the past four thousand years, but condensed into a period of only about thirty years.

Maps and Computers
By applying Cartesian coordinates to points and vectors along a line, it became possible to store and retrieve two- and three-dimensional graphics in computer files. Computer aided drafting (iaCADlg) took off in the 1970™s with major projects and software vendors responding to a tremendous demand to retire existing paper-based maps and diagrams in favor of faster, cheaper Automated Mapping (ipAMlg) programs. But an interesting thing happened on the way to AM utopia. In addition to making the chore of map maintenance faster and cheaper, the new CAD technology also made possible, for the first time, a convenient method for storing more information than the picture alone could sustain. The concept of discriminating features into dynamically selectable levels gave on-line map users a real sense of empowerment. When features were grouped into logical classes and assigned to different levels, users could customize the display of features to focus on the classes of interest for specific tasks. Furthermore, the use of database linkages allowed select features to be linked to external database records, extending the information that could be known about a feature well beyond what could otherwise be portrayed by symbol or level assignment.

In the 1980™s, relational database technology found its way to automated mapping and changed the way we thought about the linkages between graphic features and database records. Instead of segregating feature classes for levels based only on thematic content, it now became useful to separate points, lines and polygons so that the related database records could automatically contain attributes peculiar to each geometry class. Lines would know at which node they intersected with other lines, and which polygon was on the right and left side. By relating extended attributes to this fundamental topological knowledge it became possible to construct systems capable of answering questions like iiwhat parcels are affected by a proposed right-of- way expansion?ll or irwhat™s the shortest route from customer X to service Y?lv Systems with relational database extensions to automatically maintained topological pointers became known as Geographic Information Systems (isGISlt) to distinguish them from CAD systems which had less demanding requirements for the organization of feature geometries.

In addition to benefits derived from the topological knowledge managed in a GIS, users became further empowered to design increasingly detailed alternate ilviewsle of the same data for different purposes. Now, instead of storing feature symbology in a graphics file and having to explain that iha double red line is a major highwayly, GIS users focused on storing only the real- world attributes with the real-world geometry, and telling the system to iodraw every major highway as a double red line.ly The difference is more than a trick of syntax. It removed the symbol specification from the object being stored and into the realm of methods. Essentially a stored series of instructions would decide what symbol each feature should have, and any number of these instruction sets could be stored for any set of objects.


Figure 2 Œ Mapping Innovation Timeline (2)

In the 1990™s we saw the increasing role of the relational database overshadow the role of the graphics front-end in a typical GIS. Where once the external database was used as a way to pack more information behind selected graphic features, now the feature geometries are treated as just one of the many attributes an object can possess inside the relational database. Furthermore, the sets of instructions about how to portray the featuresŠthe stored itviewslŠŠcould be stored in the relational database, as well as the methods that should be invoked whenever a feature is added, changed or moved, in order to ensure that the system integrity is not violated. The fact that all these components used to be thought of as very different things (the graphic features, the non- graphic attributes, the plotting and validation programs) is now seen as a kind of historical accident created by the until-recently-corrected inability of the relational database to hold them all together in one place as an integrated wholeŠa place, it seems, they were always destined to be as soon as the appropriate generic data structures could be worked out.

The significance of what happened to the graphic front-end in a contemporary geospatial database is astonishing, given the trend towards increasing data content we plotted from the Babylonian tablets to today. On each turn we noted the trend to increase the richness of data content by elaborating, annotating and improving the placement of graphic objects in an overall picture. Maps start to work only after the user comes to terms with the overall picture (this space on the paper corresponds with a larger place on the ground), then adds to that framework layer upon layer of additional information. The geospatial database is the last evolutionary turn in a trend that really picked up the pace with the application of computer technology, so that the map- as-little-picture-of-the-world framework is no longer the starting point of the user™s experience. In its place, we have the database-as-multi-dimensional-model-of-the-world framework, in which maps exist as ephemeral byproducts. The data behind the picture has become so intense that it has effectively turned the map inside out.

Data-Centric Map Making

Implications for Map Conversion Projects
That the craft of map automation has become more like the science of database construction is probably not a surprise to anyone closely involved in geospatial data projects. Yet, there are implications for the planning, management and control of geospatial data conversion projects that are still emerging, and probably require time to reach a level of maturity comparable to the rapid advances in the integrated database targets. At the top of the list is the recognition that accuracy standards for digitizing maps are completely inadequate as quality standards for a geospatial database. The bar has been raised for data quality precisely because the level of interdependence between previously disparate elements has just gone through the roof. An error in the coding of one feature, for example, can have an affect on the behavior of adjacent features and prevent the execution of certain methods that permit connectivity tracing, symbology assignments and map image rendering. In short, almost any invalid or unexpected piece of data can have a chain reaction of inoperability throughout the entire geospatial database. The benefit of eliminating redundancies in an optimally normalized database is offset by the vulnerability the entire system experiences when even a tiny percentage of the product is defective. That is why conversion projects completed successfully with a 98% accuracy requirement can still result in an inoperable system. 2% error is too much.
    Radical Idea #1: Geospatial databases cannot operate with 2% error.
The task of building an operable geospatial database, therefore, goes beyond the careful digitizing of existing maps for two reasons:
  1. The accuracies achieved from a single operator™s interpretation of map information is too low, even after a 100% quality review inspection, and
  2. Even a perfect conversion of existing maps is not likely to produce data of appropriate normalization because the constraints on the existing maps (even digital ones) are too loose.

Figure 3 Œ Methods of Map Automation

The Downfall of Sample Inspection
The reason map conversion vendors have traditionally had difficulty reaching accuracy levels above 95-98% is the concentration of human judgment that occurs in the simultaneous interpretation of spatial and non-spatial attributes. A pipe exists from fitting iiali to fitting iibli with a certain radius, and it also has a certain diameter, material and date. That™s a lot of information to get right in one pass, and that is why most conversion vendors will institute a quality sample inspection step. The logic is, that if human interpretation is defective just 10% of the time in one pass, then a second independent pass, when reconciled against the first, should allow just 10% of 10% (or just 1%) of the defects to get through.

The problem with this logic, when it comes to digitizing maps, is that the second pass is never completely independent. A quality inspector has to look at an existing graphic object to determine if it was captured correctly, and the fact that he/she even looked at the first operator™s product before determining its correctness means that his/her judgment is already biased. Unlike the quality inspection of other tangible products like bullets or radios, the major portion of human judgment being tested for map conversion has to do with the very existence and position of an object. Every map object is different, so the inspector cannot work from a fixed imagination of what is correct. Therefore, he/she must imagine the topological properties of the correct object before seeing each object being tested. Although this may sound like an absurd requirement, it is unlikely that the statistical accuracy levels being calculated by conventional mistake-finding techniques are truly meaningful. Looking for mistakes is not the same thing as duplicating and comparing judgment. It is more like trying to find needles in a haystack. Not only is the process painful, but you never really know when you are done.
    Radical Idea #2: The true quality level of geospatial databases created by conventional map digitizing methods is not really known by mistake-catching techniques, and is probably lower than what is currently reported.
Network Topology Encoding
There is a strategy, however, that permits the automated collation and comparison of spatial objects from two independent operators, but this requires that the information for each feature be normalized and encoded before graphic construction. The strategy is seldom used because, ironically, it appears to be a step backwards in the evolution of how network topology is understood by the computer.


Figure 4 Œ Double-blind Coding and Data Entry

Earlier, we described the advance of GIS over CAD systems as having to do with the automated linkage and updating of topological pointers. Operators no longer had to tell the computer which line connected with other lines, or which polygons were on the right and left sides of a line, because the computer was capable of figuring it out directly from the vector linework that was digitized. Line iialt shares a node with line iible because that™s the way we drew the lines. In fact, it became important that the operator was prohibited from telling the computer anything about spatial topology, because to do so would create redundant information, and it was considered better for the computer to figure it out for itself. Digitizing perfect maps, therefore, meant that an operator would capture linework that approximated a topologically correct model, then apply corrections as dictated by the system until the topological anomalies are repaired. This system of implied topology works well if you trust the original linework completely, but what if the lines were drawn in the wrong place, or connected to the wrong intersections. Such errors could be the result of human judgment and need to be controlled with a quality inspection, but since the graphics are already made, we cannot apply effectively a second-pass judgment.

The alternative to digitizing and checking the results is to normalize the information sufficiently before graphic construction so that all the information can be coded and keypunched independently by separate operators, then collated and compared automatically. For most graphic network information, this can be accomplished by assigning unique identification numbers to all possible topological nodes on the source maps. Coding sheets can then be prepared using the node identification numbers for point features, and a from-to node sequence to identify linear features that span the distance between nodes. All other required attributes are added to the coding sheets in the form of normalized numeric codes where possible. After all the coding is complete and validated, a topological model exists with nodes of unknown coordinates. By using a greatly simplified digitizing technique to provide coordinates for each of the defined nodes, we supply the last piece of the puzzle that allows the computer to generate linework and symbols according to the placement rules appropriate for a given project.


Figure 5 – The Evolution of Network Topology

This coding strategy appears at first to be relatively primitive compared with the interactive implied-topology strategy afforded by a GIS, but it has the significant advantage that it can be duplicated in such a way that the interpretive judgment of two operators can be collated and compared automatically. Furthermore, because the coding step is separated from the keypunch step, we isolate the map interpretation judgments to the smallest possible operation. Radical Idea #3: Code network topology manually to compound the accuracy of interpretive judgment.

The Evolution of the Map User
Another way of looking at the need for better quality information in a geospatial database is to examine the changing roles of the data service vendor and the consumer. As long as the data provider was perceived as a map maker, then the quality expectations of the map user were relatively low. Understanding how much judgment is involved in the interpretation of map features, the consumer is satisfied if only 98% of the features look right and carry the correct attributes. But one of the consequences of empowering the map user to design his/her own maps from a multi-purpose data source is that the expectations have changed. In most cases, the consumer no longer thinks of the deliverable product as a map, but rather as a resource to generate maps, the quality of which feels more personal because errors in the data reflect on himself, rather than the data conversion service vendor. In taking on the role of map designer, consumers of geospatial data have also assumed more personal responsibility, even legal liability, for the maps they make and share with others.


Figure 6 Œ Changing Roles of Vendor and Consumer

Conclusions
As the foundation for the automation of geospatial databases, the idea of digitizing maps has had its day, largely because the quality of output is insufficient to support the integrated applications required by the database, and falls well below the ever-increasing quality expectations of the map-user turned map-designer.

As the product of end-user query upon the growing volume of accessible, relational information, however, maps and map design have become part of the mainstream vocabulary of the general public.

The traditional role of the map maker has evolved rapidly in the last thirty years such that
  1. The task of map designing has decentralized to the information consumer, and
  2. Service vendors need to think and act more like database developers and administrators.
The challenge to the geospatial data-migration project manager is to
  1. Normalize data coding as far upstream as possible,
  2. Isolate and constrict activities rich in human judgment, and
  3. Use database construction techniques that prevent errors, rather than hunt for them in the finished product.
© GISdevelopment.net. All rights reserved.