GISdevelopment.net ---> Policy ---> India

National Geo-spatial Data Infrastructure: Theories and Technologies

A. R. Das Gupta
Indian Space Research Organisation
Email: arup@ipdpg.gov.in

Introduction
The key to effective decision making and planning is information. When we address the issues of sustainable development we are looking at a multi dimensional scenario. There are the dimensions of space as represented by the earth and its features, both surface well as sub-surface and the atmosphere that shapes the climate and weather. Add to this the dimension of time which tracks changes which may be as short as a few hours for weather related phenomena to as long as years for phenomena such as forest regeneration. The dimension that is most critical is the human dimension. It represents the people who live and work in, influence and are in turn influenced by, the environment consisting of the earth and its atmosphere. To address such a complex problem the decision maker needs appropriate, accurate and timely information to help reduce uncertainties in the process of decision making. Without a modern operational information infrastructure there will be faulty decisions, trauma for the people and degradation of the environment and quality of life.

These issues have been addressed from time to time and perhaps most comprehensively in recent times by Agenda 21. A quick perusal of this Agenda shows that a reliable information infrastructure is a prerequisite. How can such an infrastructure be put in place fast and yet be accurate and dependable? What would such an infrastructure consist of? What are the processes that need to be modified or evolved which will be needed to operate such an infrastructure? This paper will address a few of the key issues.

Definitions of Spatial Data Infrastructure
The first formal definition of the term ‘National Spatial Data Infrastructure’ was formulated in the US and published in the Federal Register on April 13, 1994. It states: “National Spatial Data Infrastructure (NSDI) means the technology, policies, standards, and human resources necessary to acquire, process, store, distribute, and improve utilisation of geospatial data”. The definition of ‘Global Spatial Data Infrastructure’ follows this closely. It states: “A co-ordinated approach to technology, policies, standards, and human resources necessary for the effective acquisition, management, storage, distribution, and improved utilisation of geo-spatial data in the development of the global community”. Yet another view is that the SDI is that of a system where the general community can expect the geo-spatial data to be available and accessible transparently with networking technology. In this view co-operation and collaboration between several disciplines and the emergence of a strategic plan for the maintenance of databases, which include spatial databases, is a key component of the SDI. A third definition states: “Spatial Data Infrastructure encompasses the data sources, systems, linkages, processes, standards and institutional arrangements involved in delivering spatially-related information (both commercially and publicly held) to the widest possible group of potential users”

The Permanent Committee on GIS Infrastructure for Asia and the Pacific, PCGIAP, views the SDI as an infrastructure with the same rationale as roads and telecommunications networks. It states that the governments have a role “on behalf of the community, to provide a common and consistent infrastructure upon which a variety of government, private sector and community activities can take place”. “SDI is needed to support the regions economic growth and its social and environmental objectives, backed by international standards, guidelines and policies on the access of those data.”

From these definitions, we can see that the SDI has to cover technology, policies, and processes. It has to deliver services to a very large community

The SDI Community
In most approaches to information systems, the end user is taken for granted. We would like to reverse the process and begin our quest by identifying the end users and their needs. At the organisational level, this would include both governmental and private agencies. In the area of development, Non-governmental agencies play an important role. Development of new and advanced technologies requires the involvement of institutions for education, training and research. Finally, we must address the community of citizens for whom information is a necessity for survival and development.

We can organise these players into three groups. The first are those who generate the data like Survey of India or the National Remote Sensing Agency. The second are those who add value to the data by extracting information from a suite of data. In this category we have a very wide range of players ranging from government departments to commercial enterprises. The third category is the information users who reap the benefit of the availability of information by way of economic growth. In this category we can put administrators, managers, NGOs, farmers co-operatives, individuals and the general public.

Each community has its own requirement to be met by the SDI. The generators would like to see an efficient management system for their data. The value addition group would like to have a wide choice of data suites customised to their needs and available at an attractive price. They would also look forward to a market for their products. The third group requires reliable and cheap information at the time and place where it is most needed. All would have to work within a technical and legal framework, which has to be put in, place by the government.

As the title of this paper suggests, we will not look into all the ramifications of the SDI but look only at the technologies involved. However, we shall keep in mind the overall requirements so that the technologies are relevant to the needs of the community.

Technological Components of the SDI
The technological components of a SDI are illustrated in figure -1. We shall discuss these elements one by one.

Geodata
This forms the core of the SDI. There are multiple techniques for data creation ranging from Rapid Appraisals on the ground to Remotely Sensed Data acquired from space. The key characteristics of data are its accuracy, currency, consistency and uniqueness. Infrastructure data must be good enough to act as the base on which other data sets can be referenced. Prime examples of such data sets are satellite imagery, topographic and cadastral data sets. However, other data sets suffer from a variety of problems.


Fig 1: components of a SDI


Data is acquired by different agencies at different times, in different formats as per their immediate applications requirements. All this data exists in unrelated archives. There is considerable duplication of data as inter agency co-ordination is a rare characteristic. There are no means of ascertaining data holdings. The data characteristics and formats are not known explicitly. Relating diverse data sets is difficult due to lack of standardisation of data formats.

Standards
The answer to the above problems lies in evolving a set of standards that need to be followed by all data generators. Three major groups of standards are required. They relate to formats, contents and exchange. The format is implicit in a hardcopy such as a map or a table but in digital form, which is required in a SDI, the format has to be explicitly spelt out in the form of headers and lookup tables. The contents have to be standardised as per the subject. For example, a Level 1 Land Cover map should mean the same thing in terms of content to all users of the map. If we need to move data from one system to another, or from one software to another, or merge two independent data sets, we must have data exchange formats which the source and recipient entities understand.

The Survey of India toposheets provides a very good national standard for topographic maps. However, thematic maps have standards defined by the source organisations. There may be cases where no standards exist. In general, geoid, projection, scale, and minimum mapping unit, mapping error and registration accuracy specify spatial data objects. Hierarchy of classes specifies themes. Standards are also laid down for the source data, update cycles and for verification and validation. Conversion of analogue data to digital also involves errors and standards are laid down to ensure that these are held within tolerable limits. This is of importance when legacy data has to be converted from paper to digital form.

The issues of standardisation can be very contentious because scientists tend to draw up their own standards for specific tasks. Bringing all these ideas to a commonly agreed standard requires a great deal of interaction and discussion to iron out the issues. Standardisation may require some amount of legacy data restructuring and this must be done carefully keeping changes to a minimum.

Over and above the data standards it is also necessary to standardise data base structures, field names, file naming, conventions, etc. Many of these items can become technology specific. It is necessary to avoid technology specific standards because it is expected that the technologies in use will be heterogeneous. Further, legacy systems have to be protected hence technology specific standards must be avoided. Standardisation of hardware and software must be avoided. Rather stress should be laid on interoperable systems that provide continuity of legacy systems.

There are several standardisation efforts under way. At the international level there is the effort being pursued by the International Standards Organisation through its ISO/TC211 initiative. Nationally there are several efforts. The US has its Spatial Data Transfer Standards, SDTS. The Canadians are examining the Spatial Archive and Interchange Format, SAIF. In India, we have two standards: the National Resources Information System, NRIS, Standards and the Survey of India Digital Vector Data Standard, DVD. The former is a content-based specification while the latter addresses the data exchange issues. An attempt has been made to merge the two to form a single content cum transfer specification. These are the first steps in the direction of evolving a national standard for India.

Standardisation can become an unmanageable behemoth. Consequently, most efforts remain at the research level. There is a need to temper the rigour of standardisation with the flexibility of implementation. An interesting approach in this regard is the initiative of the Open Geodata Inter-operability Specification, OGIS of the OGIS Consortium. This specification seeks to standardise processes rather than contents or formats. This allows systems, software and applications adhering to the OGIS to easily exchange data. In the final analysis, we will require a mix of standardised formats and content as well as processes to realise the SDI.

The Spatial Framework
The spatial framework is very crucial and it is important that this be decided upon in the beginning. In its simplest form, it is a frame of latitudes and longitudes with intermediate tic marks aimed at providing an invariant reference for all spatial data sets. However, most users need some basic references. Thus, it also can include ortho-rectified imagery, elevation, bathymetry, geodetic control, transportation, administrative boundaries, etc. Data is generated by several agencies but need to be used together. Hence, all data has to be registered to this framework so that they can be related to each other. The framework must meet the mapping accuracy desired by the applications. The choice of the geoid and the projection systems has a bearing on the accuracy. Further, the accuracy is also a function of the scale of mapping. In India, we have the advantage of having an excellent cartographic database in the Survey of India topographic sheets. These are based on the Everest spheroid and Polyconic and Lambert Conformal Conical projections. The framework of this system is ideally suited for providing the structure for a spatial database. However, as this framework is shared with the Armed Forces there are severe restrictions on the use of the framework. Consequently, the SDI as conceived above will become ineffective as a facility for the open community.

An alternate framework is under discussion and should be operationalised in a short time. This framework will serve all databases from 1:1million to 1:25000. This can comfortably cater to the needs of the administration for planning and monitoring purposes. However, this will not be sufficient for project implementation. This level requires databases at 1:10,000 or larger scales. These maps are usually cadastral maps using local projections. Interlinkage of these maps with the projected framework is an involved task and no standardised procedures exist. With the increased use or Remote Sensing for thematic mapping, such inter-linkages are essential and hence this is an area in need of urgent attention.

Metadata
Simply put, metadata is data about data. GIS or DBMS software creates a Data Dictionary to tell the underlying software how to handle the data. Such metadata is internal to the system. However, the SDI requires a metadata system, which is a stand-alone data management tool. In this form, metadata is used to describe the characteristics of the database such as custodian, data description, geographic extent, currency, storage format, data quality, contact information, etc.

Metadata systems allow users to explore and determine whether the data set is useful or not without having to go through the data in detail. The metadata system can be compared to a catalogue in a library. We can use it to determine which data sets are useful. However, to access the data we need to go to the bookracks.

Clearinghouse
The Clearinghouse is the mechanism to provide access to the metadata and finally to the actual data sets. The clearinghouse is a set of electronic repositories interconnected by a network. It provides access to data via metadata systems and search engines. The clearinghouse has to have systems to authenticate data requests and requesters. Where the data is priced, the clearinghouse must provide the necessary order forms or secure transaction gateways. Spatial data volumes are usually large and download through Internet may not be feasible. In such cases, the system should be able to generate media bearing the requested data for transmission by mail. Since the clearinghouse will handle different sources of data, there is a need for standardisation of access procedures, user interfaces and the metadata itself.

The clearinghouse should also store information about the applications and availability of application specific modules that could be reused by other users.

The clearinghouse uses search engines to look for and discover data and information. One of the engines is based on the Z39.90 standard that allows software and system independent search. Metadata engines which allow the user to query the data set and select records from the actual data is an area of research. Such engines are present in the background of any DBMS but they do not have the capability of distributed processing over the Web.

Communications Network
The de facto choice of network is the TCP/IP based Internet. This choice is governed by two considerations. Firstly, the Internet provides a means of seamlessly accessing a heterogeneous network of servers using a standard Graphical User Interface as provided by the Web browser. Secondly, all spatial databases provide means of publishing data on the Internet through specialised Internet servers. A further attraction is the development of advanced mark-up languages, in particular, the eXtensible Mark-up Language, XML, which supports semantic tagging of information and semi-structured data.

The prime consideration of the physical part of the network is bandwidth, as geospatial data is usually of large volume. Today, optical fibre technology provides the best bandwidth option but the last mile will always be copper. Advent of wideband satellites utilising on-board processing may provide a suitable solution for India.

Applications Specific Modules
The use of data to meet specific end user needs is the goal of the SDI. The ingenuity of the value added services would be needed to provide customised modules for use by lay users. These modules would use region specific models to generate scenarios for decision support. There should be provision for tweaking and tuning the models as well as for adding new models. These models should cater to all clients who could range from government functionaries to individuals.

The access to the system should be simple and menu driven with minimum of user data entry. It should preferably be in the local language and should not require the user to have knowledge of GIS or remote sensing to be able to interact with the system.

The use of Web servers to deliver the content makes it necessary to generate web-enabled application modules. Most GIS and DBMS have added functionalities to publish maps on the Web. It is also necessary to be able to interactively access analytical functions through cgi-scripts, servlets, applets or XML wrappers.

Partnerships This is strictly not a technology issue but the concept of the clearinghouse and the implicit agreement on data and applications sharing requires a strong partnership programme. The Internet provides an ideal model of such a partnership where all interact as equals. The partnership needs to include Institutions in the government, industry, academia, societies and individuals.

Technological Challenges
From the above general discussion, we can identify a few critical technology areas, which require immediate attention.
  • Definition and finalisation of a spatial framework exclusively for development purposes
  • Finalisation of an Indian Standard for the SDI
  • Rapid conversion of legacy data as per the standards
  • Creation of metadata and activation procedure for clearinghouses. Examination of the Z39.90 and other related standards.
  • Specification of a broadband network and its realisation
  • Development of Web enabled applications modules and Indian language interfaces.
It is also necessary to look into aspects of Data Warehousing and related Data Mining technologies to fully utilise the enormous amount of data that is likely to be available because of the establishment of the SDI. The Data Warehouse has to be looked at in two levels. At the source, the granularity of the data can be quite high, as the spread will be contained to the range of the source’s activities. At the SDI, level the granularity has to be coarser as the data volumes will be very high. Some have differentiated these as Data Marts and Data Warehouses respectively.

Conclusion
This paper has attempted to review the concepts and the efforts towards the establishment of the SDI in various countries and organisations. The availability of high quality data from satellites and other sources, powerful desktop computers, advanced software and broadband communications has made the job of implementing the SDI easier. However, technology alone does not make the SDI. There is a need to address many other issues. The concept of partnership has already been addressed. Others are
  • Creating awareness and support among potential sources and users
  • Identifying and including all stakeholders
  • Complementing related initiatives
  • Evolving policies which enhance the establishment and operation of the SDI
  • Addressing legal, IPR and commercial issues
  • Mobilising human and financial resources
It will also be necessary at some point of time to address regional and global co-operation with efforts such as the APSDI and the GSDI.

References
  • Fritz Petersohn, Kenneth Primozic, Bill Robertson, Alex McDonald, “Global Spatial Data Infrastructure providing for substantial and sustainable economic development in the developed and developing countries of the world”, Discussion Paper prepared for the World Congress on GSDI, North Carolina, October 1997
  • Andrew Phillips, Ian Williamson, Chukwudozie Ezigbalike, “Spatial Data Infrastructure Concepts”, Australian Surveyor, Vol 44, No. 1, p20-28, 1999
  • The Geographical Engineering Group http://www.unb.ca/GGE/Research/GEG.html 
  • Mark E. Reichardt, John Moeller, “SDI Challenges for a New Millennium. NSDI at a Crossroads: Lessons learned and Next Steps”, 4th Global Spatial Data Infrastructure Conference, cape Town, South Africa, 13-15 March 2000
  • Olaf Østensen, “Mapping the Future Geomatics”, ISO Bulletin, December 1995
  • http://mcmcweb.er.usgs.gov/sdts/ 
  • http://home.gdbc.gov.bc.ca/SAIF/ 
  • NRIS Node Design and Standards, ISRO-NNRMS-SP:72-2000, February 2000
  • “Standard Exchange Format for Digital Vector Data”, Survey of India,
  • “National Spatial Database Standards”, Survey of India and Indian Space Research Organisation. (Draft Specification for review by Standing Committee on Cartography)
  • Kurt Buehler, Lance McKee, ed.,”The OpenGISTM Guide” OGIS TC Document 96-001
  • Andrew Phillips, Ian Williamson, Chukwudozie Ezigbalike, “The Importance of Metadata Engines in Spatial Data Infrastructures”, AURISA ’98, 26th Annual Conference of AURISA, Perth, Western Australia, 23-27 November, 1998.
  • Clifford A. Lynch, “The Z39.90 Information retreival Standard” D-Lib Magazine, April 1997
  • Ily Zaslavsky, Richard Marciano, Amarnath Gupta, Chaitanya Baru, “XML-based Spatial Data Mediation Infrastructure for Global Interoperability”, 4th Global Spatial Data Infrastructure Conference, Cape Town, South Africa, 13-15 March, 2000.
  • Pushpalata Shah, Navita Relan, I. C. Matieda, “Integrating Arc/Info AMLs with CGI scripts for interactive Web based GIS Applications”, Map India 2001, New Delhi, January 7-9, 2001
  • Derek Clarke, “The Global spatial data infrastructure and emerging nations – Challenges and opportunities for global co-operation”, UNRCC, April 11-14, 2000, Kuala Lumpur.
© GISdevelopment.net. All rights reserved.