Geographical Data Sets
Geographic Data Types Although the two terms, data and information, are often used indiscriminately, they both have a specific meaning. Data can be described as different observations, which are collected and stored. Information is that data, which is useful in answering queries or solving a problem. Digitizing a large number of maps provides a large amount of data after hours of painstaking works, but the data can only render useful information if it is used in analysis. Spatial and Non-spatial data Geographic data are organised in a geographic database. This database can be considered as a collection of spatially referenced data that acts as a model of reality. There are two important components of this geographic database: its geographic position and its attributes or properties. In other words, spatial data (where is it?) and attribute data (what is it?) Attribute Data The attributes refer to the properties of spatial entities. They are often referred to as non-spatial data since they do not in themselves represent location information.
Spatial data Geographic position refers to the fact that each feature has a location that must be specified in a unique way. To specify the position in an absolute way a coordinate system is used. For small areas, the simplest coordinate system is the regular square grid. For larger areas, certain approved cartographic projections are commonly used. Internationally there are many different coordinate systems in use. Geographic object can be shown by FOUR type of representation viz., points, lines, areas, and continuous surfaces. Point Data Points are the simplest type of spatial data. They are-zero dimensional objects with only a position in space but no length. Line Data Lines (also termed segments or arcs) are one-dimensional spatial objects. Besides having a position in space, they also have a length. Area Data Areas (also termed polygons) are two-dimensional spatial objects with not only a position in space and a length but also a width (in other words they have an area). Continuous Surface Continuous surfaces are three-dimensional spatial objects with not only a position in space, a length and a width, but also a depth or height (in other words they have a volume). These spatial objects have not been discussed further because most GIS do not include real volumetric spatial data. Geographic Data -- Linkages and Matching Linkages A GIS typically links different sets. Suppose you want to know the mortality rate to cancer among children under 10 years of age in each country. If you have one file that contains the number of children in this age group, and another that contains the mortality rate from cancer, you must first combine or link the two data files. Once this is done, you can divide one figure by the other to obtain the desired answer. ![]() Exact Matching Exact matching occurs when you have information in one computer file about many geographic features (e.g., towns) and additional information in another file about the same set of features. The operation to bring them together is easily achieved by using a key common to both files -- in this case, the town name. Thus, the record in each file with the same town name is extracted, and the two are joined and stored in another file.
Hierarchical Matching Some types of information, however, are collected in more detail and less frequently than other types of information. For example, financial and unemployment data covering a large area are collected quite frequently. On the other hand, population data are collected in small areas but at less frequent intervals. If the smaller areas nest (i.e., fit exactly) within the larger ones, then the way to make the data match of the same area is to use hierarchical matching -- add the data for the small areas together until the grouped areas match the bigger ones and then match them exactly. The hierarchical structure illustrated in the chart shows that this city is composed of several tracts. To obtain meaningful values for the city, the tract values must be added together.
Fuzzy Matching On many occasions, the boundaries of the smaller areas do not match those of the larger ones. This occurs often while dealing with environmental data. For example, crop boundaries, usually defined by field edges, rarely match the boundaries between the soil types. If you want to determine the most productive soil for a particular crop, you need to overlay the two sets and compute crop productivity for each and every soil type. In principle, this is like laying one map over another and noting the combinations of soil and productivity. A GIS can carry out all these operations because it uses geography, as a common key between the data sets. Information is linked only if it relates to the same geographical area. Why is data linkage so important? Consider a situation where you have two data sets for a given area, such as yearly income by county and average cost of housing for the same area. Each data might be analysed and/or mapped individually. Alternatively, they may be combined. With two data sets, only one valid combination exists. Even if your data sets may be meaningful for a single query you will still be able to answer many more questions than if the data sets were kept separate. By bringing them together, you add value to the database. To do this, you need GIS. ![]() Figure 2 Principal Functions of GIS Data Capture Data used in GIS often come from many types, and are stored in different ways. A GIS provides tools and a method for the integration of different data into a format to be compared and analysed. Data sources are mainly obtained from manual digitization and scanning of aerial photographs, paper maps, and existing digital data sets. Remote-sensing satellite imagery and GPS are promising data input sources for GIS. Database Management and Update After data are collected and integrated, the GIS must provide facilities, which can store and maintain data. Effective data management has many definitions but should include all of the following aspects: data security, data integrity, data storage and retrieval, and data maintenance abilities. Geographic Analysis Data integration and conversion are only a part of the input phase of GIS. What is required next is the ability to interpret and to analyze the collected information quantitatively and qualitatively. For example, satellite image can assist an agricultural scientist to project crop yield per hectare for a particular region. For the same region, the scientist also has the rainfall data for the past six months collected through weather station observations. The scientists also have a map of the soils for the region which shows fertility and suitability for agriculture. These point data can be interpolated and what you get is a thematic map showing isohyets or contour lines of rainfall. Presenting Results One of the most exciting aspects of GIS technology is the variety of different ways in which the information can be presented once it has been processed by GIS. Traditional methods of tabulating and graphing data can be supplemented by maps and three dimensional images. Visual communication is one of the most fascinating aspects of GIS technology and is available in a diverse range of output options. Data Capture an Introduction The functionality of GIS relies on the quality of data available, which, in most developing countries, is either redundant or inaccurate. Although GIS are being used widely, effective and efficient means of data collection have yet to be systematically established. The true value of GIS can only be realized if the proper tools to collect spatial data and integrate them with attribute data are available. Manual Digitization Manual Digitizing still is the most common method for entering maps into GIS. The map to be digitized is affixed to a digitizing table, and a pointing device (called the digitizing cursor or mouse) is used to trace the features of the map. These features can be boundary lines between mapping units, other linear features (rivers, roads, etc.) or point features (sampling points, rainfall stations, etc.) The digitizing table electronically encodes the position of the cursor with the precision of a fraction of a millimeter. The most common digitizing table uses a fine grid of wires, embedded in the table. The vertical wires will record the Y-coordinates, and the horizontal ones, the X-coordinates. The range of digitized coordinates depends upon the density of the wires (called digitizing resolution) and the settings of the digitizing software. A digitizing table is normally a rectangular area in the middle, separated from the outer boundary of the table by a small rim. Outside of this so-called active area of the digitizing table, no coordinates are recorded. The lower left corner of the active area will have the coordinates x = 0 and y = 0. Therefore, make sure that the (part of the) map that you want to digitize is always fixed within the active area. Scanning System The second method of obtaining vector data is with the use of scanners. Scanning (or scan digitizing) provides a quicker means of data entry than manual digitizing. In scanning, a digital image of the map is produced by moving an electronic detector across the map surface. The output of a scanner is a digital raster image, consisting of a large number of individual cells ordered in rows and columns. For the Conversion to vector format, two types of raster image can be used. The raster image is processed by a computer to improve the image quality and is then edited and checked by an operator. It is then converted into vector format by special computer programmes, which are different for colour/grey tone images and binary images. Scanning works best with maps that are very clean, simple, relate to one feature only, and do not contain extraneous information, such as text or graphic symbols. For example, a contour map should only contain the contour line, without height indication, drainage network, or infrastructure. In most cases, such maps will not be available, and should be drawn especially for the purpose of scanning. Scanning and conversion to vector is therefore, only beneficial in large organizations, where a large number of complex maps are entered. In most cases, however, manual digitizing will be the only useful method for entering spatial data in vector format. ![]() Figure 3 Data Conversion While manipulating and analyzing data, the same format should be used for all data. This Scanning System implies that, when different layers are to be used simultaneously, they should all be in vector or all in raster format. Usually the conversion is from vector to raster, because the biggest part of the analysis is done in the raster domain. Vector data are transformed to raster data by overlaying a grid with a user-defined cell size. Sometimes the data in the raster format are converted into vector format. This is the case especially if one wants to achieve data reduction because the data storage needed for raster data is much larger than for vector data. A digital data file with spatial and attribute data might already exist in some way or another. There might be a national database or specific databases from ministries, projects, or companies. In some cases a conversion is necessary before these data can be downloaded into the desired database. The commonly used attribute databases are dBase and Oracle. Sometimes spreadsheet programmes like Lotus, Quattro, or Excel are used, although these cannot be regarded as real database softwares. Remote-sensing images are digital datasets recorded by satellite operating agencies and stored in their own image database. They usually have to be converted into the format of the spatial (raster) database before they can be downloaded. Spatial Data Management Geo-Relational Data Model All spatial data files will be geo-referenced. Geo-referencing refers to the location of a layer or coverage in space defined by the coordinate referencing system. The geo relational approach involves abstracting geographic information into a series of independent layers or coverages, each representing a selected set of closely associated geographic features (e.g., roads, land use, river, settlement, etc). Each layer has the theme of a geographic feature and the database is organized in the thematic layers. With this approach users can combine simple feature sets representing complex relationships in the real world. This approach borrows heavily on the concepts of relational DBMS, and it is typically closely integrated with such systems. This is fundamental to database organization in GIS. Topological Data Structure. Topology is the spatial relationship between connecting and adjacent coverage features (e.g., arc, nodes, polygons, and points). For instance, the topology of an arc includes from and to nodes (beginning of the arc and ending of the arc representing direction) and its left and right polygon. Topological relationships are built from simple elements into complex elements: points (simplest elements), arcs (sets of connected points), and areas (sets of connected arcs). Topological data structure, in fact, adds intelligence to the GIS database. Attribute Data Management All Data within a GIS (spatial data as well as attribute data) are stored within databases. A database is a collection of information about things and their relationships to each other. For example, you can have an engineering geological database, containing information about soil and rock types, field observations and measurements, and laboratory results. This is interesting data, but not very useful if the laboratory data, for example, cannot be related to soil and rock types. The objective of collecting and maintaining information in a database is to relate facts and situations that were previously separate. The principle characteristics of a DBMS are: - Centralized control over the database is possible, allowing for better quality management and operator-defined access to parts of the database; Data can be shared effectively by different applications; The access to the data is much easier, due to the use of a user-interface and the user-views (especially designed formula for entering and consulting the database); Data redundancy (storage of the same data in more than one place in the database) can be avoided as much as possible; redundancy or unnecessary duplication of data are an annoyance, since this makes updating the database much more difficult; one can easily overlook changing redundant information whenever it occurs; and The creation of new applications is much easier with DBMS. The disadvantages relate to the higher cost of purchasing the software, the increased complexity of management, and the higher risk, as data are centrally managed. Relational Database -- Concepts & Model The relational data model is conceived as a series of tables, with no hierarchy nor any predefined relations. The relation between the various tables should be made by the user. This is done by identifying a common field in two tables, which is assigned as the flexibility than in the other two data models. However, accessing the database is slower than with the other two models. Due to its greater flexibility, the relational data model is used by nearly all GIS systems Choosing geographic data The main purpose of purchasing a geographic information system (GIS)* is to produce results for your organization. Choosing the right GIS/mapping data will help you produce those results effectively.
Data: The Core of Your Mapping / GIS Project When most people begin a GIS project, their immediate concern is with purchasing computer hardware and software. They enter into lengthy discussions with vendors about the merits of various components and carefully budget for acquisitions. Yet they often give little thought to the core of the system, the data that goes inside it. They fail to recognize that the choice of an initial data set has a tremendous influence on the ultimate success of their GIS project. Data, the core of any GIS project, must be accurate - but accuracy is not enough. Having the appropriate level of accuracy is vital. Since an increase in data accuracy increases acquisition and maintenance costs, data that is too detailed for your needs can hurt a project just as surely as inaccurate data can. All any GIS project needs is data accurate enough to accomplish its objectives and no more. For example, you would not purchase an engineering workstation to run a simple word-processing application. Similarly, you would not need third-order survey accuracy for a GIS-based population study whose smallest unit of measurement is a county. Purchasing such data would be too costly and inappropriate for the project at hand. Even more critically, collecting overly complex data could be so time-consuming that the GIS project might lose support within the organization. Even so, many people argue that, since GIS data can far outlast the hardware and software on which it runs, no expense should be spared in its creation. Perfection, however, is relative. Projects and data requirements evolve. Rather than overinvest in data, invest reasonably in a well-documented, well-understood data foundation that meets today's needs and provides a path for future enhancements. This approach is a key to successful GIS project implementation. Are Your Data Needs Simple or Complex? Before you start your project, take some time to consider your objectives and your GIS data needs. Ask yourself, "Are my data needs complex or simple?" *Italicized words can be found in the Glossary at the end of this document except for words used for emphasis or words italicized for reasons of copyediting convention or layout. If you just need a map as a backdrop for other information, your data requirements are simple. You are building a map for your specific project, and you are primarily interested in displaying the necessary information, not in the map itself. You do not need highly accurate measurements of distances or areas or to combine maps from different sources. Nor do you want to edit or add to the map's basic geographic information. An example of simple data requirements is a map for a newspaper story that shows the location of a fire. Good presentation is important; absolute accuracy is not. If you have simple data needs, read this paper to get the overall picture of what GIS data is and how it fits into your project. A project with simple data requirements can be started with inexpensive maps. Your primary interests will be quality graphic- display characteristics and finding maps that are easy to use with your software. You need not be as concerned with technical mapping issues. However, basic knowledge of concepts such as coordinate systems, absolute accuracy, and file formats will help you understand your choices and help you make informed decisions when it's time to add to your system. What issues suggest more complex GIS data needs?
If your data requirements are complex, you ought to pay particular attention to the sections of this paper that discuss data accuracy, coordinate systems, layering, file formats, and the issues involved in combining data from different sources. Also keep in mind that projects evolve, and simple data needs expand into complex ones as your project moves beyond its original objectives. If you understand the basics of your data set, you will make better decisions as your project grows. Basics of Digital Mapping Vector vs. Raster Maps The most fundamental concept to grasp about any type of graphic data is making the distinction between vector data and raster data. These two data types are as different as night and day, yet they can look the same. For example, a question that commonly comes up is "How can I convert my TIFF files into DXF files?" The answer is "With difficulty," because TIFF is a raster data format and DXF™ (data interchange file) is a vector format. And converting from raster to vector is not simple. Raster maps are best suited to some applications while vector maps are suited to others. ![]() Figure 4 Raster data represents a graphic object as a pattern of dots, whereas vector data represents the object as a set of lines drawn between specific points. Consider a line drawn diagonally on a piece of paper. A raster file would represent this image by subdividing the paper into a matrix of small rectangles-similar to a sheet of graph paper-called cells (figure 1). Each cell is assigned a position in the data file and given a value based on the color at that position. White cells could be given the value 0; black cells, the value 1; grays would fall in-between. This data representation allows the user to easily reconstruct or visualize the original image. ![]() Figure 5 A vector representation of the same diagonal line would record the position of the line by simply recording the coordinates of its starting and ending points. Each point would be expressed as two or three numbers (depending on whether the representation was 2D or 3D, often referred to as X,Y or X,Y,Z coordinates (figure 2). The first number, X, is the distance between the point and the left side of the paper; Y, the distance between the point and the bottom of the paper; Z, the point's elevation above or below the paper. The vector is formed by joining the measured points. Some basic properties of raster and vector data are outlined below.
The size of the cells in a raster file is an important factor. Smaller cells improve image quality because they increase detail. As cell size increases, image definition decreases or blurs. In the example, the position of the line's edge is defined most clearly if the cells are very small. However, there is a trade-off: Dividing the cell size in half increases file size by a factor of four. Cell size in a raster file is referred to as resolution. For a given resolution value, the raster cost does not increase with image complexity. That is, any scanner can quickly make a raster file. It takes no more effort to scan a map of a dense urban area than to scan a sparse rural one. On the other hand, a vector file requires careful measuring and recording of each point, so an urban map will be much more time-consuming to draw than a rural map. The process of making vector maps is not easily automated, and cost increases with map complexity. Because raster data is often more repetitive and predictable, it can be compressed more easily than vector data. Many raster formats, such as TIFF, have compression options that drastically reduce image sizes, depending upon image complexity and variability. Raster files are most often used:
Digital Map Formats- How Data Is Stored The term file format refers to the logical structure used to store information in a GIS file. File formats are important in part because not every GIS software package supports all formats. If you want to use a data set, but it isn't available in a format that your GIS supports, you will have to find a way to transform it, find another data set, or find another GIS. Almost every GIS has its own internal file format. These formats are designed for optimal use inside the software and are often proprietary. They are not designed for use outside their native systems. Most systems also support transfer file formats. Transfer formats are designed to bring data in and out of the GIS software, so they are usually standardized and well documented. If your data needs are simple, your main concern will be with the internal format that your GIS software supports. If you have complex data needs, you will want to learn about a wider range of transfer formats, especially if you want to mix data from different sources. Transfer formats will be required to import some data sets into your software. Vector Formats Many GIS applications are based on vector technology, so vector formats are the most common. They are also the most complex because there are many ways to store coordinates, attributes, attribute linkages, database structures, and display information. Some of the most common formats are briefly described below
Raster Formats Raster files generally are used to store image information, such as scanned paper maps or aerial photographs. They are also used for data captured by satellite and other airborne imaging systems. Images from these systems are often referred to as remote-sensing data. Unlike other raster files, which express resolution in terms of cell size and dots per inch (dpi), resolution in remotely sensed images is expressed in meters, which indicates the size of the ground area covered by each cell.
An Example of Raster and Vector Integration ![]() Figure 7: An Example of Raster and Vector Integration Vectors & Raster Data Models - Merits & Demerits.
Hybrid System It is an integration of the best of Vector and Raster Models. The GIS technology is fast moving towards Hybrid model GIS. ![]() Figure 8: The Integration of Vector and Raster System Hybird System | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||