GISdevelopment.net ---> Application ---> Environment

Spatial Data Management to support Global Forestry-related Information Network

Atie Puntodewo, Mohammad Agus Salim
Center for International Forestry Research (CIFOR)
Jl. CIFOR, Situ Gede, Sindangbarang
BOGOR 16680, Indonesia
Tel. +62 251 622622; Fax +62 261 622100
Email: a.puntodewo@cgiar.org, m.a.salim@cgiar.org



ABSTRACT
Finding the right data in the right time is one of the main aspects of powerful GIS. Beside the methodology, the quality of spatial analysis really rely on the credibility of the data that being used. Low data quality data will directly impact to reliability of the end result.

Nowadays, enormous numbers of spatial data are generated around the world and these data are available and many of them can be search through the internet. In a user's perspective, to be able to find the data that suits one's purpose will save a lot of time, and reduce the duplication of efforts. On the other hand, the data publisher needs to market their data and to make their data accessible to their consumer.

In order to make your data credible and accessible is a complex aspect that not only limited on how to acquire and process the data but also how the data being managed. Data management is crucial in an organization, especially when organization has to deals with large numbers of data, many sources and time.

This paper will describe on how the spatial data being manage in CIFOR and how this will contribute to the global forestry related information network

INTRODUCTION

The background
Center for International Forestry Research (CIFOR), is one of the 15 centers of the Consultative Groups on International Agriculture Research (CGIAR). As an international research and global knowledge institution, CIFOR committed to conserving forests as well as improving livelihoods of people in the forests. Over the years, CIFOR and its collaborators have produced spatial data in the course of various projects undertaken. Other than that, we also aware that geospatial information is often acquired from a wide range of external organizations and mixed with in-house data. That is why in our spatial database, we also store spatial data published by other organizations that relevant to our research. These data, needs to be disseminated to a wider audience and to facilitate this, the use of distributed spatial data sources is needed.

Management of spatial data in CIFOR is one of the tasks of the GIS Unit. In order to make the spatial information available to the whole organization is a task that is goes beyond than just to implement the appropriate software. A certain procedures need to be established, such as who creates and check the integrity of meta-information, what information is made available and how the meta-information database is set-up, etc. Other than managing the data, we also have to convince CIFOR management that the long term benefits from this activity will outweigh the investment costs.

The target audiences are not only CIFOR scientists but also a diverse set of stakeholders at local, national and international level, such as government analyst, conservation and development NGOs, policy makers, and anyone who interested in forest related information.

The context
Nowadays, enormous number of forestry-related information is generated around the world by many different sources. While the information published electronically by information-provider is increasing, many information-seeker still find problems in finding the information that meets they need.

We aim to develop a spatial data management system that improves the data accessibility, based on international standards and best practices, supporting a wide range of forest-related spatial research and stimulate stakeholders to provide data to it. We believe that standards are important because consistent information increase the reliability and effectiveness of services we provide.

In the digital world today, metadata is important because it describes data using terminology that defines data and facilitates consistent collecting, indexing, querying and publishing, as well as documents the content, quality, source organization, data format, organization, spatial reference, distribution mechanism, etc. From a data management perspective, metadata is very useful in maintaining organization's investment of data. From the user perspectives, metadata will help them locating appropriate datasets, as well as telling how to interpret and use the data. Sharing data between organizations that leads to integrated approach to spatially related policy issues can also be facilitated through publishing metadata.

IMPLEMENTATION

The standards
As an international organization who deals with many organizations at the global level as well as at regional or national level, CIFOR spatial data management have to comply with many different standards in order to reach broadest community. We understand that many different standards have been developed and each of them has its unique focus. The purpose of these standards is to facilitate data sharing and increase interoperability among automated spatial information systems.

There are many types of spatial-related information standards, in this paper we want to limit ourselves to the content and access standards.

Content standard is the standard that established a common set of terminology and definitions for concepts related to metadata. IThere are some standards that we think is relevant to explore, those are:
  • A standard that have widely used to describe metadata content in the spatial data community is the one that have been developed by the Federal Geographic Data Committee, known as the FGDC standard. Even was developed specifically for spatial data, it has been used for non-spatial data as well. This standard is ideal for describing spatial data content, because almost all elements specific to geo-spatial data are present. As the consequences, FGDC looks very complicated.
  • The International Standards Organization (ISO) also has standards related to Geographic Information/Geomatics. The one that is defines the clear procedure for descriptions and promote the proper use and effective retrieval of geographic data known as the ISO 19115. The ISO metadata standard is not identical to FGDC, even though it allows maximum compatibility between the two. We believe that in complying with ISO standard will make our metadata easier to be cast by wider network
  • Other standard that is important toCIFOR is the Dublin Core (DC). As we all know, DC is intended to be simple to use and general enough to be applied to resources in any discipline. That's why DC is the standards that most widely used for resource discovery. To enable interoperability and provide direct access to data or metadata level, access standards is also defined. This will enable access from many sources with diverse platform to services provided by the information providers via internet. CIFOR concern to support to implement 3 open access standards, those are:
    • Open GIS Consortium Web Mapping Service (OGC-WMS) produced maps (visual representation) of geo-referenced data. These maps are generally rendered in pictorial format, such as PNG, GIF, and JPG. This specification standardizes the way in which maps are requested and the way that servers describe their data holdings.
    • Z39.50 (formally Application Service Definition and Protocol Specification, ANSI/NISO Z39.50-1995) is a standard composed of specifications for computer-to-computer linkage between different information retrieval systems. Its purpose is to encode the messages required to communicate between two computer systems for the specific purpose of information searching and retrieval. Although it developed from the need to exchange bibliographic information, the protocol is defined to serve as a search and retrieval service completely independent of the structure of the underlying data. It is designed to allow searching on remote systems without prior knowledge of the other system's syntax, strategies, or data content
    • Open Initiatives Protocol for Metadata Harvesting (OAI-MHP) provides an application-independent interoperability framework based on metadata harvesting defined by the Digital Library Federation. There are two classes of participants in the OAI framework:
      • Data Providers administer systems that support the OAI-MHP as a means of exposing metadata;
      • Service Providers use metadata harvested via the OAI-MHP as a basis for building value-added services.
    System development
    Other than from complying with the standards, from the very start we have decided that we want to minimize the amount of time needed to create and maintain the data and metadata, and at the same time maximize its usefulness to widest community of users. In order to do that, CIFOR develop tools and infrastructure not only to minimize efforts in managing data but also to provide easy and fast data accessibility.

    Metadata Tools
    To ensure an easy metadata creation process, a tool for metadata generation has been created. This tool came in two versions, 'Metadata Editor' is an add-in to ArcCatalog, and 'Metadata Explorer' is a stand alone tool that can be used in any computer. Both have similar capabilities as a tool for metadata generation.


    Figure 1 Metadata Editor

    The tool is consists of 18 mandatory fields and distributed in seven metadata groupings and have details information in each tab. User can easily write down the descriptions of elements in the space provided. When the user saves the metadata form, the program will automatically validate to see whether all the mandatory fields are correctly filled in. If there are empty fields or errors, a pop-up screen will notify. Only when all the fields are correctly filled in, the metadata can be published to the online metadata server. Metadata created with this tool is stored in an XML file alongside the data source. In order to maintain its compliances with other metadata content standards, a crosswalk to FGDC, ISO and DC standards is created. With this crosswalk, it is possible to create an XML file that is compliant across those standards.

    The metadata tool is created to simplify the metdata generation, and to ensure that metadata created will comply with the three content standards. From the 18 mandatory fields, 7 of it will be automatically filled. The detail elements is as follows:
    1. Content citation, contain information such as title, date of creating and publishing, originator and contact addresses;
    2. Content descriptions, contain detail information about the dataset such as summary of content, theme, purpose, data generation processes;
    3. Content status, describe information about the status and the current ness of the dataset;
    4. Keywords, contain information about how the data can be search, such as the data category, keyword and thesaurus;
    5. Usage, describe information about how to access the data, such as restrictions of access and use;
    6. Distribution, describe information on distribution type, protocol, on-line or off-line connections and size of data;
    7. Spatial domain, describe information about location of the dataset, such as the place name, datum and projection, coordinates.
    Metadata Database
    Metadata database is place where all metadata of each dataset stored. This metadata database connected with Forest Spatial Information Catalog (FSIC) and can be accessed by public. All metadata is classified in 14 categories, as follows:
    • Agriculture and Forestry
    • Atmospheric and Climatic
    • Base Map and Imagery
    • Boundary
    • Cultural, Society and Demographic
    • Documents
    • Economy
    • Elevation and Derived Products
    • Environment and Conservation
    • Geophysics
    • Network
    • Tools
    • Water Resources.
    These categories are based on ISO 19115 Topical Category Definition classifications. Metadata database was built under Microsoft SQL Server, ESRI ArcSDE and ArcIMS platform.

    Spatial Database
    In spatial database, we store all kinds of spatial data, including remote sensing imagery and other raster dataset, vector maps, elevation datasets, and documents from many sources. This diverse kind of data and also demand of data access leads GIS Unit to develop not only file-based or client server architecture data storage system, but also in a centralized database management system (DBMS) environment that supports multi user access. All data stored in the database classified based on its custodian which usually a research division or project. For security reason, this database can only be accessed within CIFOR intranet and limited only to certain user with privileges. This database was also built under Microsoft SQL Server and ESRI ArcSDE platform.

    Forest Spatial Information Catalog (FSIC)
    FSIC is a web-based portal for one-stop access to spatial publications, maps and other documents that will simplify the ability of all levels of visitors to find forestry related data and resources. FSIC provides direct access to metadata database and also tools that enable user to search and browse all the resources. FSIC also provide direct access to the data and metadata through service based architecture that can be enable to anybody.


    Figure 2 FSIC Components

    FSIC developed using Java technology, open source Struts web application framework also supported by ESRI framework in web and database development (ArcIMS and ArcSDE). FSIC enforces the model Model-View-Controller (MVC) design pattern that making it developed a little longer to create up front, but relatively easier in maintenance and further development.

    Beside a catalog service that provide tools to browse and search metadata, FSIC provide open protocol services with direct access to data and metadata. Other than that, FSIC also provide the capability in web-mapping. These services provided to serve diverse user from individual level to organizations through a data clearinghouse.

    DATA MANAGEMENT
    Early attention to data management and archiving is a critical step in ensuring the success of its long term benefits. Dataset, and ancillary information such as metadata, must be preserved for decades and stored in ways that promote:
    • access, as data need change;
    • reprocessing, as errors are discovered or calibration is improved;
    • integration, as new data products, algorithms, and data technologies are developed; and
    • user-friendly access tools.
    Information is an expensive resources and a good way in handle information is as an investment that will return benefit to the organization in a very long time. Organizations often have to deals with enormous amount of data and a lot of activities. Mismanagement of these valuable resources can lead to vagueness of investment. In order to define what is the best way to manage the data, it is to understand the important characteristics of the data.

    Data Workflow
    CIFOR has to deal with enormous amount of data from many resources and these data is increase from time to time. Despite of the sizes, we also collects many kinds of data format (raster, vector, maps and elevation datasets).


    Figure 3 Data Workflow

    All of these data collected and maintained by GIS Unit in a centralized database so it can be reuse simultaneously by the owner of the data, and by other scientists. Beside providing information storage, GIS Unit have also be responsible in providing an easy and fast information retrieval systems.

    In order to have better use of the data, each data in the spatial database must be accompanied by a metadata that provide descriptive information about the author, originator, content, quality, condition and any other characteristics. Metadata will give the user a deep understanding of the data before he or she get hold of the data. There are no limitations on how metadata should be described, but it will be good if it can be describe in a simple and a clear term to make it understandable by as many users as possible. community.

    Procedures
    In order to provide credible datasets, CIFOR implements data management policy that includes defining certain procedures to handle our spatial data. These procedures are:
    1. A credible spatial dataset (maps, images, etc.) GIS Unit receives many data from many sources to be collected and maintained. Before this data can be stored in the spatial database, we have to be sure that the dataset have been cleaned from syntax error.


      Figure 4 Data Preparation Procedures

    2. A good and metadata description
      Metadata is descriptive information about how your data was created or collected. This information details about who is responsible for the ownership and usage of the data, type of data, how and when the data was created/collected and the types of quality assurance procedures used. This information will helps others to understand the quality level of your data.

      It is clear that metadata will helps user to understand more about data so it can help user to use it in proper way. But the first question that needs to be answered is who should create metadata? Data owner is the one who knows the data most than the other. This will be the right person who should create the metadata. This is the best but not limited. Others can also create metadata as long they have an access to resources about the data (report, publication or access to data creator itself) that will help metadata creator to create good quality metadata.

      Once metadata has created, the content need to be check regarding completeness of information and also the quality of the content to make sure it's really describes data condition and characteristics.

    3. Only a good dataset accompanied by a clear metadata can be published
      After we sure about the quality of the dataset and its metadata, the data can be published. This phase is fully handled by GIS Unit. The spatial data will be managed in spatial database and while the metadata in the metadata database. Once it has been published, the metadata can be accessed directly by public through FSIC.
    KNOWLEDGE SHARING AND NETWORKING
    CIFOR committed to strengthen global level spatial community, especially forest related community. That commitment realized by share resources, develop infrastructure, tools and implement data management to provide reliable data that benefits not only CIFOR but also to the stakeholders.

    Data Share and Dissemination
    With FSIC, CIFOR want to share and disseminate its spatial database. As a web-based portal for one-stop access to spatial data, publications, maps and other documents, FSIC all levels of visitors will be able to find forestry related data and learn more about forestry projects underway. FSIC provide accurate and accessible data to researchers, development practitioners and planners in governmental, community and non-governmental organizations. It aims to provide a common platform for sharing forest-related data by fostering common standards and good practice among stakeholders and research partners. FSIC store and shares over 1200 metadata and hundreds of downloadable resources.

    The new trend underway to integrate heterogeneous application using service based architecture also leads FSIC to provide more than a catalog service. FSIC implements several open protocol standards to support interoperability also direct access to data and metadata through service based architecture that can be use without any restriction.

    Partnership
    CIFOR involves in several partnership initiatives with regards to strengthening global forest related information network with global scale organizations. Involvement in these partnerships includes knowledge sharing in infrastructure development as well as data and metadata sharing.

    Some of CIFOR partners are:
    • CGIAR - Consortium for Spatial Information (CSI)
      This consortium was build from the initiatives of many geomatic scientists within the Consultative Group for International Agriculture Research (CGIAR). Its aim is to link the efforts of CGIAR scientists, national and international partners, and others working to apply and advance geospatial science for International Sustainable Agriculture Development, Natural Resources Management, Biodiversity Conservation, and Poverty Alleviation in Developing Countries. Other than that, CSI works to facilitate collaboration and capacity building for Data sharing, data dissemination, and Geospatial Analysis amongst CGIAR centers.

    • Global Forest Information Service (GFIS)
      GFIS was build as an initiative of the Collaborative Partnership on Forest (CPF). CIFOR act as one of the leader of GFIS, together with the International Union of Forest Research organization (IUFRO), and the Food and Agriculture Organization of the United Nations (FAO). GFIS is an internet gateway that provides access to forest-related information through a single entry point. GFIS provides an open exchange standard for its information categories.
    LESSON LEARNED
    To be able to find best practices in Implementation of data management within CIFOR is a long and iterative process that needs a lot of time and resources. Many efforts have been taken to look for better and better approach. Through this process, a lot of lessons have been learned and hopefully these might be useful as reference to others to reduce duplication in efforts.

    Use Internationally Accepted Standards
    Working in global community involves many systems with different platform and resources with different structure. In order to collaborate all of systems and resources requires consensus in 'communicate' between each other. This is not an easy task, but implementation of internationally accepted standards will make this significantly easier. Since everybody works in the same manner, 'communication' efforts will be significantly decreased.

    Do not work from scratch
    Reinventing on something already done by others is useless and pointless. Keep focus and try to improve on what already exist will increase efficiency, save a lot of time also resources. This applies to both the standards and infrastructure development.

    Start Documenting Data
    In an organization where data management is not a concern, most likely will had a condition where data stored in an unorganized way. By start collecting all data that have current or future use and related documentation, store it in an organized way and create metadata for each of data is a good beginning and will increase the value of the data in the future.

    CHALLENCES
    As technology growing very fast, data management is become dynamic aspect that always change. That is why we will never stop to innovate and look for better approaches, make improvement and takes benefit from the latest technology. We hope that CIFOR will keep its commitment to implements more standards to provide better services to broader community.

    CONCLUSIONS
    Data management system that covers data storage, documentation (metadata) and dissemination is an important aspect in not only to strengthens internal information system but also to support global forestry related information network. This will give benefit not only to the organization but also to the broader community. There are several keys that need to be considered:
    • Data dissemination is mean interaction in broader community. Implementation of standards will be one of the key to keep synergy between one another.
    • To keep data well documented is going to add more values to the data and it will help data consumer to find the right data that fits their need
  • © GISdevelopment.net. All rights reserved.