|
|
|
December 2000
|
R2V: Automated Map Digitising
Yecheng (Ted) Wu Ph.D., President/CEO, Able Software Corp., 5, Appletree Lane, Lexington, MA 02420, USA
Email: ywu@ablesw.com
Vector data is easier than raster
data to handle on a computer because it has fewer data items and flexible to be
adjusted for different scale more flexible to be adjusted for different
scale.
Raster and vector are the two basic data structures for storing and manipulating images
and graphics data on a computer. All of the major GIS (Geographic Information
Systems) and CAD (Computer Aided Design) software packages available today are primarily based on one of the two structures, either raster based or vector based, while they have some extended functions to support other data structures.
Raster image comes in the form of individual pixels, and each spatial location or resolution element has a pixel associated where the pixel value indicates the attribute, such as colour, elevation, or an ID number. Raster image is normally acquired by optical scanner, digital CCD camera and other raster imaging devices. Its spatial resolution is determined by the resolution of the acquisition device and the quality of the original data source. Because a raster image has to have pixels for all spatial locations, it is strictly limited by how big a spatial area it can represent. When increasing the spatial resolution by 2 times, the total size of a two-dimensional raster image will increase by 4 times because the number of pixels is doubled in both X and Y dimensions. It is true when a larger area is to be covered when using same spatial resolution.
Vector data comes in the form of points and lines that are geometrically and mathematically associated. Points are stored using the coordinates, for example, a two-dimensional point is stored as (x, y). Lines are stored as a series of point pairs, where each pair represents a straight line segment, for example, (x1, y1) and (x2, y2) indicating a line from (x1, y1) to (x2, y2). In general, vector data structure produces smaller file size than raster image because a raster image needs space for all pixels while only point coordinates are stored in vector representation. This is even truer in the case when the graphics or images have large homogenous regions and the boundaries and shapes are the primary interest.
Flexible for Geographic Representation
When geometric shapes need to be represented precisely in a GIS or CAD system, vector data structure is always the option to use because it is not limited to spatial resolution or pixel size and mathematical formulae can be used for regular shapes and smooth curves. In addition, polygon topology is another important issue when implementing a GIS system. Vector data structure makes it easy to describe if a region is on the left side or the right side of a common boundary or if a point is in or out of a polygon area.
Another advantage vector data structure has over raster image is the flexibility of resising without losing resolution. For example, graphical features such as rivers and roads in a map viewed with a real-world projection system can be easily displayed at any scale without physically changing the data. By contrast, raster image has to be stretched and distorted when scaled beyond its native resolution. Besides the above issues, vector data is easier than raster data to handle on a computer because it has fewer data items and more flexible to be adjusted for different scale, for example, a specific projection system in a GIS database. This makes vector data structure the apparent choice for most mapping, GIS (Geographic Information System) and CAD (Computer Aided Design) software packages.

Shows regions of inteShows an original scanned soil
map with dark background in the left window. The image in the right window is processed
with the background removed. The processed image will reduce the amount of
editing needed after automatic vectorisation process |
Why Is Raster to Vector Conversion Needed?
When vector data is not readily available for setting up a GIS database, the vector data is normally created from existing paper maps or natural source images, such as aerial photos or satellite imagery. Because of its abstract form, vector data has traditionally been acquired using manual tracing with a digitising tablet from paper maps or base images. The disadvantages of the manual method are slowness and lack of accuracy because human hand is capable of resolution only to the level of about 40 dots per inch (DPI). For a typical contour map, it can take one skilled operator several weeks to trace all the lines manually. The intensive labour requirement makes large mapping and GIS project difficult and expensive to implement.
With the development of scanning technology, image scanners have become cost-effective and capable of high resolution, in the range of 100 – 1,200 DPI. Moreover, similar developments in automated raster to vector conversion have made it possible to take a paper map, scan it and accurately convert it into vector format. This method uses computer to automatically extract vector data from scanned images and eliminates the manual tracing process. Using raster to vector conversion technology, large scale map digitising or GIS database creation project can now be accomplished in a much shorter time with less demand on human resources.
How Raster to Vector Conversion Is Done?
While vector data structure provides a simpler and more abstract data representation than raster image, an automatic conversion from raster to vector, or so called vectorisation process, is not a very easy task, although the opposite direction (from vector to raster) is quite trivial and straightforward. There have been extensive research efforts focused on the issues involved in raster to vector conversion during the past decades.
A complete raster to vector conversion process includes image acquisition, pre-processing, line tracing, text extraction (OCR), shape recognition, topology creation and attribute assignment.

Shows regions of
interest defined for the SPOT image (Washington DC, USA) and image
cropped to indicate the regions to be processed
|
Setting Scanning Resolution
The image acquisition process generates the initial raster image at a certain spatial resolution. The quality and resolution of the raster image are key factors for the quality and accuracy of the vectorised data. It is always recommended to start with clean and sharps originals and scans at a reasonable resolution.
The scanning resolution should match the resolution at which the original image source was created. If scanning resolution is set too high than the original image source, it not only uses unnecessary amount of system resource to process, but also noise and artifact are scanned or generated. This is the same case as looking at a low resolution hardcopy map through using a large magnify glass, rough edges, dots and even paper texture are visible. If you scan a paper map using very high scanning resolution and see a lot of noise in the scanned image, especially when using a colour scanner, lowering the scanning resolution may be the solution to improve its quality. However, if lines are toughing each other in the scanned image, then it indicates the scanning resolution is too low and higher scanning resolution is certainly needed.

Shows a classified and
vectorised colour topo map. The generated vector data is displayed in the
window on the right. 3D-elevation model is created from the vector
contours and displayed in the upper right window.
|
Choose Image Type
Most good quality black and white maps and engineering drawings, including
color map separates, can be scanned as 1-bit monochrome. If the background is
clean and the scanned image does not show many dots and speckles, 1-bit
monochrome is the perfect type to use because it takes less storage space and is
faster in display and processing.
For single color maps with dirty and
smearing background, such as old maps or blue prints, they can be scanned as
8-bit greyscale and cleaned using image-processing techniques, such as
background removal. Noise and other artifacts can be easily smoothed out using a
pair of grey level thresholds before automatic vectorisation. Greyscale image
provide more information than 1-bit monochrome image for image processing tasks
such as background and noise removal but is normally 8 times larger in size than
1-bit monochrome image. If smaller image size is preferred, one safe way is to
start with greyscale image typed and uses software to clean up and convert to
1-bit monochrome for storage. On the other hand, image compression, such as JPEG
and wavelet methods, can be applied to reduce the size of greyscale image while
maintaining the same pixel bit depth.
Although color scanners have come
a long way, large format and high resolution scanning is still quite expensive.
If the source image is in color and a good quality color scanner is available,
scanning using 24-bit color image type certainly gives the benefit of separating
color layers and simplifying the vectorisation process. Color separation
normally uses color classification or color ratio based methods to divide
millions of colors into a limited number of color groups and each group is
assigned a single color. Each color in the classified image can then be
vectorised or extracted to create a single color image.
Other color
images, such as satellite and aerial photos, have been used directly to create
vector data, such as region boundaries, street and road lines. Because of more
bits (normally 24-bit) are used, color image files are much bigger than 8-bit
greyscale and 1-bit monochrome images and require more system resource to store
and process. Of course, image compression techniques can help to reduce the size
of color images. When using lossy compression, such as JPEG or wavelet-based
methods, compression ratio must be carefully selected so image quality is not
sacrificed since the success of vectorisation depends heavily on the quality of
the image.
Preprocessing Steps Preprocessing steps are
different depending on the image type. For 1-bit monochrome image, de-speckle is
often used to remove noise and smooth rough edges. For 8-bit greyscale image,
thresholding and background removal are processing steps to improve image
quality for vectorisation. For color images, they are often classified to
separate the colors so each color can be vectorised into a separate vector
layer.
Defining regions of interest (ROI) for vectorisation or image
cropping is another often used preprocessing step to limit the processing only
in the areas interested. It is important to allow the use of polygons and group
of polygons to include cases such as islands, holes, rings and other shapes.
Image mosaic or stitching is normally done when a source map is larger
than the scanner can handle. In this case, the map is scanned into sub-sections
and then merged into a whole image for raster to vector conversion. This is
often done as a post-processing step by merging the vector data sets after each
section is vectorised. Merging vector data instead of raster image certain has
its advantages, because vector data takes much less computer memory and can be
processed faster while image stitching can create huge size images that are
beyond the processing capability of a regular PC.

Shows that closed polygons are
created from vectorised line segments.
|
Vectorisation
The line tracing process extracts two types of lines: center line
and boundary line. The center line method tracks the center pixel within a
raster line and follow to the line until it reaches an intersection or the end
of the line. The boundary line method tracks the boundary pixels of a color
region to get closed polygons.
Although there have been many methods
developed for line tracing, they can be divided into two groups: line thinning
and line following. The line thinning method is more of a global approach, which
iterates through the entire image in multiple passes and eliminates boundary
pixels during each iteration until only the skeleton pixels are left. The line
following method uses computer intelligence to analyze line shapes, thickness
and intersections to follow the line centers. This method is frequently employed
in semi-automatic interactive tracing while line thinning based methods is used
for fully automatic conversion of complex images.
After lines are
extracted, they are labeled with line attributes or elevations if contours.
Closed polygons can be generated from line segments to create the topology.
Control points are defined and applied to geo-reference the vector data to a
projection system.
One common use of labeled contour lines is creation
of 3D DEM (Digital Terrain Model) and other 3D data models. We will be seeing
more and more use of 3D display in the next couple of years in GIS and computer
mapping applications. The use of 3D visualisation gets us one step closer to the
3D world we live in but it puts more demand on computer software and hardware.
Many people think today's computer technology is far more powerful than we need,
they are right if word processing is what they do everyday. They will be
surprised how much more computing power is badly needed when 3D digital terrain
model is used in real time and how much worse it can get when high resolution
satellite imagery are draped onto the surface of the digital terrain model. We
are quite sure that faster CPU, bigger memory and better quality display will
not be wasted in GIS and computer mapping applications.
Choosing the
Right Conversion Tool Several raster to vector conversion software
packages are commercially available for different applications, such as
engineering drawing conversion, map digitising and GIS data capture. The R2V
software developed by Able Software Corp. (www.ablesw.com) in 1993 has a focus
on vectorisation of scanned maps and GIS data creation.
Below are few
questions one should ask when selecting the right tool for the task:
- Does it support different image types, such as 1-bit black/white, greyscale
and 24-bit RGB color? This is quite important for people whose source images are
in color. Treating color images as black and white or greyscale apparently loses
all color information and a significant amount of editing may be needed to
separate colors by hand.
- Is it designed for maps or engineering drawings? In practice, the handling
of map data and engineering data are quite different although they both are
vectors based. If a package is designed for CAD drawings, the algorithms
normally works well for straight lines and regular geometric shapes and will not
be efficient for curving lines, polygons and topology between polygons.
Geo-referencing is another crucial factor for maps and GIS database while it is
normally not a concern for CAD applications.
- Does it support the native format for your application? It's unfortunate
that most vector file formats used today are different and data exchange between
two formats can easily result some data loss. One format may be excellent for
CAD data transfer, but very limited if you need to get data into a GIS or
mapping database. When creating vector data, it is always better to use the
native format the target system supports.
- Image processing functions The quality of raster to vector conversion
depends largely on the quality of the source image that is affected by many
factors, including scanner, cleanness and age of the source map, scanning
resolution, color or black/white, and others. Without necessary image processing
functions, such as remove background for old maps with blue background, color
separation for color maps, define polygon-based region of interest (ROI), image
rubber sheeting to correct distortion, the usefulness of the final vector
product may be quite limited.

Shows that boundary lines are traced
directly from a classified SPOT image. Different color regions are traced and
put into separate map
layers. |
Future
Developments Because of the complexity, automated raster to vector
conversion has attracted a significant amount of research focus in the past
decades among GIS and image processing communities. Although commercial products
have been developed and used in production type applications for large scale map
digitising and GIS data capture projects, there is still room for improvements
and demand for new algorithms and technologies, for example, color image
processing and color separation, text recognition (OCR), use of satellite
imagery to create vector map layers and others.
When 24-bit true color
is used to scan color maps or drawings, each pixel has 3 color components (red,
green and blue). Each component is recorded as an 8-bit integer number with the
value range of 0 - 255. Roughly, a 24-bit color image can have up to 16.7
million different colors. Classifying the millions of colors into a small number
of color groups becomes a challenge, especially when some color groups have only
small number of pixels and the source image quality is not perfect. Clustering
based color classification and ration based methods have been developed to solve
this problem but in many cases, more robust methods are needed to achieve more
satisfactory color separation result.
Text recognition (OCR) is another
challenge faced by developers and researchers. To reliably recognize text labels
in maps, the first step is to separate them from lines, also known as the text
segmentation step. Once separated, text recognition engines are applied to
identify the text and convert them to computer readable ASCII code, or unicode
for other languages, such as Chinese and Japanese. Conventional text recognition
(OCR) technologies have not worked well for recognizing text in maps and
drawings, largely due to the variety of fonts, sizes and orientations used.
International languages add more difficulty to the problem.
When high
resolution satellite imagery become more affordable and easily accessible, we
will be seeing more use of them to create GIS data layers and update existing
map data. To automate the process from raw satellite imagery to finished vector
data layers, new methods and products will be developed to recognize and map
natural objects, such as roads, building roof tops, trees, vegetation, water and
so on. Not only lines and polygons will be generated from the images, but also
important attribute and layer information associated with the graphical objects.
References
- Y. Wu, "Raster, Vector, and Automated Raster-to-Vector Conversion", in
"Moving Theory into Practice: Digital Imaging for Libraries and Archives", book
eds. by Anne R. Kinney and Oya Y. Rieger, 2000, RLG, Cornell Univ. Library
- Y. Wu, "R2V Conversion: Why and How?", GeoInformatics, No. 6, Volume 3,
Sept. 2000, pp. 28-31
- L.R. Poos and Y. Wu, "Digitizing History: GIS and Historical Research", GIS
World, July 1995, pp. 48-51
- J.R. Parker, "Algorithms for Image Processing and Computer Vision", 1997,
John Wiley & Sons.
About the Author Dr. Yecheng
(Ted) Wu is the founder and president of Able Software Corp. and the author of
Able Software's R2V software, which is currently being used in more than 60
countries for automated map digitizing and GIS data capture applications. Dr. Wu
is also the author of the 3D-DOCTOR software, a new 3D imaging and visualization
software for medical and scientific imaging and rendering applications.
|
|
|
|
|
|
|