Design and Implementation of a Spatial Data Clearinghouse through Internet, Using XML Technology
Azadeh Keshtiarast
M.Sc. Student, Dept. of GIS Eng
a_keshtiarast@yahoo.com

Ali A. Alesheikh
Assistant Professor
Dept. of Geodesy and Geomatics Eng.
alesheikh@kntu.ac.ir
Ehsan Mohammadi
M.Sc. Student, Dept. of GIS Eng.
mohammadiehsan@yahoo.com
K.N. Toosi University of Technology
Vali_asr St, Tehran, Iran, P.C. 19697
Tel: +98 21 878 6212 Fax: +98 21 878 6213
Abstract
Spatial data is the most important component of a Geospatial Information System. Collecting new datasets for GIS projects is very costly and time consuming, and GIS users prefer to acquire their needed datasets from existing ones. Hence, constituting a community in which all data users and producers connect to each other and learn about each others activities, seems to be necessary. Spatial data clearinghouse is a distributed system in which all datasets can be registered and searched. Producers register their spatial datasets and users search for the datasets. In Iran like other countries the want for having such a system is being felt and spatial data users and producers need to connect to other users and producers and take advantage of this system. The aims of this research are to investigate the needs for designing a national clearinghouse for Iran and implement a clearinghouse system and finally introduce new methods of search through this system. In this research SVG is introduced as a new graphical technology that facilitates the search. Using SVG gives more opportunity to implement new methods of search by map. The database of implemented system uses both relational and XML database. XML is better for documentation, but supporting structured query language by relational database increases the efficiency of the system.
1- Introduction
It is estimated that nearly 80% of attempt in every GIS system refers to its data.According to vitality of spatial data in GIS, great efforts have been undertaken to study spatial data and so many concepts have been introduced in the areas of spatial data and information. Concepts such as data warehouses, data marts, clearinghouses, data mining, interoperability, and spatial data infrastructure have emerged as potentially powerful tools to handle and reuse spatial data. Some of these concepts are so closely-related that there are bound to be some confusion regarding differences between them, along with their potential applications.
The main objective of this research is to study other countries activities and to design a clearinghouse system which is independent of any metadata standard; which means it is adoptable through changes of metadata standard.With the fast growing of Internet technologies and using new graphical systems like SVG, this research found new methods of graphical search, which searches spatial datasets through Internet using spatial objects. This method is novel and no clearinghouse before used the same method.This research consisted of various steps
Step1: Definition of concepts of clearinghouse and metadata, and understanding the necessity of having a clearinghouse system.
Step 2: The next step was studying on metadata, as the main component of every clearinghouse system.
Step 3: Thus, there is no standard metadata and no clearinghouse system for Iran’s spatial data a needs analysis was done with main organizations concerning with spatial data and through the need analysis, metadata terms were extracted for the implementation part.
Step 4: Implemented system has used two database technologies separately; first XML database and second Relational database, and results were compared. Using new SVG technology in the implementation phase is novel in this research and no other clearinghouse has used this method before.
2- Architecture of Spatial data Clearinghouse
A clearinghouse activity is helpful for all GIS users to find their spatial datasets easily. In the clearinghouse first an interface must be provided to receive and store metadata from producers. In the metadata all information available about spatial dataset must be provided according to standard of the system. Also information about accessing to spatial datasets must be given. For example through downloading from a website or achieving by mail or so on. Then other users who seek for spatial datasets must search the clearinghouse database with metadata criteria. Then they can watch through the results of search to find which datasets are suitable for their needs, according to the information that metadata provides. If user found any appropriate dataset, he/she can contact to the website of producer via the URL of dataset included in the metadata, and order the dataset by the means provided in the website of producer. Figure 1 illustrates the simple architecture of a typical clearinghouse.

Figure 1. simple architecture of a typical clearinghouse
In brief the process of working with a clearinghouse is in three following steps:
- Register metadata to clearinghouse database (by custodian including producer or owner)
- Search for spatial datasets according to metadata (by user)
- Request for spatial dataset from the results of search (by user form website of data producer)
2-1- Metadata
In its spatial term, metadata is simply “data about data”. The USGS defines metadata as: the content, quality, condition, and other characteristics of data. Metadata describes spatial data and gives significant information about data. Metadata increases the value of spatial data by giving valuable information about data, which cannot be taken from data directly. With metadata, spatial data can be interpreted more accurately and precisely. [HREF7]
In other words, metadata is the what, who, how, why and where of spatial data.
-What for: What is the data? A brief summary of data elements.
-Who for: Who are the contact person, and the responsible party for maintenance of spatial data?
-How for: How was data collected? What procedures or equipments were used to collect data? How accurate is data and to what resolution?
-Why for: Why data was collected and what was the purpose of data collection?
-Where for: Where is data located physically, digitally and geospatially?
Because metadata describes characteristics of spatial data, it gives additional value to data.also ,having a standard for metadata helps spatial data be managed in an appropriate way and helps data producers and data owners to organize their spatial data in a unique manner. (Alesheikh et al 2004)
Metadata as a description of data can be used as criteria to search for spatial data and find suitable data. Spatial data users can use metadata to find their needed data and then evaluate spatial data to find out whether these data match their needs or not.
Through the great need to have standards of metadata and collect metadata for spatial datasets, some organizations and communities have planned to compose standards of metadata. Some of these metadata standards are:
- FGDC metadata standard
- ANZLIC metadata standard
- CEN metadata standard
- ISO TC 211 standard (ISO 19115-GI)
- And so on.
Studying metadata standards can be the topic of another research, so in next section discusses XML and the ability of this markup language to contain different kinds of data. Also another technology called SVG is being discussed briefly, as a child of XML.
2-2- XML
XML is a universal, text-based data exchange standard. In traditional data exchange formats, data is defined by the position it takes in the file structure. In XML the position of the data is not important. Instead, tags identify the data content. Like HTML, XML works on the principle of tags, which delineate elements and content. (Kim, 2003)
XML has many benefits over other data formats. Some of these benefits are:
- Simplicity: Information coded in XML is easy to read and understand, plus it can be processed easily by computers.
- Openness: XML is a W3C standard, endorsed by software industry market leaders.
- Extensibility: There is no fixed set of tags in XML. New tags to hold different information can be created as they are needed.
- Self-description: In traditional databases, data records require schemas set up by the database administrator. XML documents can be stored without such definitions, because they contain descriptions in the form of tags and attributes.
2-3- SVG
SVG is a modularized language for describing two-dimensional vector and mixed vector/raster graphics in XML. SVG is a standard way to describe graphics and graphical interactions (HREF4).
Searching spatial datasets according to objects can be easily done through using SVG graphical format. Thus the variety of graphical search can increase using SVG. Users can draw rectangular extents or circular extents to mention the search area, or can find spatial datasets related to a specific feature, like datasets of a city or a lake or a state and so on.
3- Design and Implementation
The search criteria in a typical spatial data clearinghouse, is the metadata prepared for each dataset. So collecting the metadata for datasets not only helps to document datasets, but gives a good view of datasets to clearinghouse for search task.
Needs analysis had two steps: (1) First, the elements to be used in metadata were identified. It is realized that most of the spatial datasets did not have any metadata. Another important problem about spatial datasets in Iran is that, there is no accepted metadata standard to be followed. So after interviewing with responsible officers in these organizations a collection of about 30 elements of metadata was accepted to be used in spatial data clearinghouse. (2) Second step was to produce forms and collect metadata elements for existing spatial data of the mentioned organizations. These two steps resulted about 250 set of metadata, prepared for each dataset. It must be noted that for each spatial dataset, only one set of metadata was prepared.
3-1- Architecture
Spatial data clearinghouse is a distributed system that allows to register datasets and to search for the datasets (Alesheikh and Helali 2002). The best environment to implement such a system is World Wide Web network. The reasons include: (1) Internet is expanded around the whole world and everyone can have access to it. (2) Working with Internet has become simple and all people can work with it easily. Thus the best environment for implementing a national or a global clearinghouse is Internet. But first of all it must be declared that what kind of users can access to this system.
3-1-1- Concerned people
Three kinds of people are concerned with a clearinghouse activity:
- the manager of clearinghouse;
- The owner of spatial data who wants to announce about his/her spatial data;
- the user who searches for spatial data.
The task of managing the system and performing the connections between two other users is the responsibility of the system manager. Also system manager must look for future demands of users and prepare the system to meet their new needs. The producer and owner of spatial data needs a mechanism to validate and register the metadata prepared for each dataset. And finally the spatial data user needs a mechanism to search for spatial datasets. A basic architecture of a clearinghouse system, which emphasizes on users, is shown in figure 2.

Figure 2. an architecture of a clearinghouse with emphasis on users
3-1-2- Components to be Designed
Through preparing the system three main components are supposed to be designed:
- An appropriate interface for spatial data owners to register their metadata and an appropriate interface for spatial data users to search for datasets.
- A good database system to save registered metadata and to update, delete and retrieve metadata.
- A good and interactive medium to pose the connection between two previous components.
The implemented clearinghouse for Iran’s spatial datasets, follows the architecture of client/server. Some parts of the system like forms to register metadata or appropriate forms and graphics to search for spatial data are on the client side and both users, either owner or searcher, encounter with these forms and graphics on the client side. Then the system achieves the request from a client and sends it to the server side. Server processes the request and sends the results back to the client. So the architecture of a clearinghouse from the client/server point of view is like figure 3.

Figure 3. clearinghouse follows the client/server architecture
3-1-3- Interface of the System
For registration, a well-formed client side form is prepared. In this form all items and components of the accepted metadata has been designed. After an owner fills the appropriate fields with appropriate data, system sends metadata to server side and server sends the request to the database system and returns the result of registration. Either the registration was done successfully or not the user is being announced. The interface for collecting metadata from owners is shown in figure 4 The form is designed in HTML format and uses PHP technology to do this task.

Figure 4. metadata collection form
The search form is an HTML form document which allows users to enter their criteria for locating datasets. To increase the facilities of search four options are available:
- Keyword Search: keyword search accepts a keyword and searches the database for the word. If the word was found in any fields of metadata, results will include that metadata record completely. Figure 5. is a snapshot of this form.

Figure 6. advanced search page Figure 5. the keyword search page
- Advanced Search: in this form four items have been determined to narrow the search progress. The advantage of this search over previous one is that this search system seeks for criteria in the already imposed fields, not all fields and as such increases the speed of the search. Figure 6 illustrates the items of advanced search for the prepared web document. Of course these items are chosen optionally and can vary to fit the users’ needs best.
- Graphical Search by Extent: a map of Iran with the main cities and a network of longitudes and latitudes were designed with SVG. A user clicks two points on the map to mark the extent of search area. All datasets in that area are the results of search. This activity can be seen in figure 7.
- Graphical Search by Province: a map of Iran’s Provinces is designed with SVG technology. A user clicks on each province and the name of that province appears in an edit box below the map. By clicking on search button all datasets belonging to that province will be brought as search result. In figure 8 the page of this search item is shown.

Figure 8. the graphical search by province Figure 7. graphical search by extent
The search forms are prepared in HTML and the graphical forms are HTML documents equipped with interactive SVG maps of Iran. Figure 9 illustrates the results of running extent search query. In this form users also can enter the extents of the map directly.

Figure 9. the results of graphical search
The technologies to support interactive maps are limited. One of the common ways which is implemented in most clearinghouses around the world is using Java Applets or IMSs which takes a lot of time to be loaded. Also it needs some components to be installed in client’s machine, like JVM (Java Virtual Machine) to support these applets. A new technology introduced by W3C is the technology of SVG, which is simple and does not need any extra component to be run and gives the same abilities. Aside from being simple it does not take time for SVG to be loaded and it uses capabilities of JavaScript language to become interactive.
3-1-4- Interactive Website
The whole system must be interactive enough to answer the requests of users online. So, the system must have been designed with an interactive website programming language. Dozens of programming languages are available for this purpose. All these languages have growing life and use state of the art technologies to produce interactive websites. Some of these languages are: DHTML, ASP, ASP.NET, PHP, Java, and so on. To understand which programming language is suitable for web designing refers to the needs expected from the web page.
For the purpose of designing the clearinghouse, the programming tool must support user-friendly interface and perform suitable connections with any kind of databases. PHP is one of these interactive website designing languages, which uses HTML’s power to varnish the façade of web pages and supports many kinds of libraries to interact with any kind of data from any kind of databases. According to the interactive website design, the architecture of the clearinghouse is like figure 10.

Figure 10. an architecture of clearinghouse with emphasis on web technologies
Users just see and enter criteria in an HTML web page, and get the result of search in the same page. They can even connect to the data owners’ homepage through the results of the search.
3-1-5- Database
In the implementation of clearinghouse for Iran’s spatial data two kinds of databases were used separately and have been tested for some parameters, and results have been compared. These two kinds of databases are: (1) relational database provided by MySQL and (2) XML database.
MySQL is a relational DBMS and is used to save information in relational model. It has a good compatibility with PHP, and all classes and functions to prepare SQL results have been made available. MySQL server is a free to use software and can be downloaded from Internet.
To prepare the database for the purpose of clearinghouse, all fields of metadata in the registration form have been designed in the relational database as well. If the registration is successful, MySQL will provide a record for accepted metadata set; if not the user will be notified to complete and redo the whole process. Advantages offered by a relational database are:
- Relational database allows the metadata to be constructed using a relational model rather than hierarchical model; this model may more accurately reflect information about the underlying datasets and will allow common sections of the metadata to be created only once for multiple datasets, thus reduces effort required for metadata creation and maintenance.
- Relational database does not require that field tags be replicated in each document.
- Relational database allows the use of the Standard Query Language, a widely accepted search language with more sophisticated options than that offered by Boolean searching alone.
- Relational database has attracted greater commercial investment; therefore users can expect better relational connections in documentation, more sophisticated tools, easier interface, and a variety of support options.
Another kind of database prepared for Iran’s spatial data clearinghouse was in XML format. XML is a universal, text-based data exchange standard. In traditional data exchange formats, data is defined by the position it takes in the file structure. In XML the position of the data is not important. Instead, tags identify or define the data content.
3-1-6- Results of Comparison
Comparing both databases helps to decide about suitable database model for clearinghouse activity. Relational database and XML database used for Iran’s spatial data clearinghouse are compared from some aspects. The results of this comparison are given in table 1 :
Table 1. results of comparison

XML has great capabilities to store data of any kind. It is a very good and recommended way of documenting and storing metadata. Also XML uses a concept named “schema”, which standardizes the documents’ contents according to a predefined structure. So for documenting metadata for each organization, XML is preferred. But XML databases occupy more memory than relational databases. And besides, the primary goal of XML which is data transformation and transfer is not considered for a clearinghouse system. Using XML for such systems needs more study.
For activities like spatial data clearinghouse which needs fast speed and handling great deal of data through network, relational database models seem to be better. Free software like MySQL can handle about 8 Tetra Bytes of information. Commercial SQL providers, of course can handle more data appropriately. Besides these kind of databases can support more complicated sets of queries easily and seem to be more appropriate for searching data through Internet.
4- Conclusions and Recommendations
this section is dedicated to results of the taken research to implement clearinghouse for national purposes are given.
- Having a spatial metadata standard is mandatory for GIS users. Also considered standard must be specialized for clearinghouse activity.
- To have a national clearinghouse, all concerned organizations must participate. It is the duty of each organization to develop appropriate websites to let users request for spatial datasets which belong to these organizations.
- In the implemented clearinghouse, technologies like XML and SVG were used. Using these new and operable technologies is novel. SVG has a great capacity for web GIS activities. Using graphics of SVG and extensibility of XML or other databases should be regarded for developers of GIS applications through web.
5- Refrences
- Alesheikh, A.A, A.K. Oskouei, F. Atabi, and H. Helali (2005). "Providing Interoperability for Air Quality in-Situ Sensors Observations Using GML Technology" International Journal of Environmental Science and Technology, Vol. 2, No 2, Pp 133-140.
- Alesheikh, A.A, H. Helali (2002). "Web GIS Development Strategy" GIM International, Nov. 2002, Vol. 16, No 11, Pp 12-15
- Alesheikh Ali A., Ali Aien-Saeid, and M. Kalantari (2004) “Towards an Iranian Geospatial Data Transfer standard” Proceedings of Geomatics 83 Conference, Tehran, Iran.
- Kim, L (2003) “The official xmlspy handbook”, John Wiley Press
- [HREF1] “What is Metadata?”
- http://www.wsdot.wa.gov/TA/T2Center/Mgt.Systems/InfrastructureTechnology/Toolbox/Metadata.PDF
- [HREF2] “Content Standard for Digital Geospatial Metadata (CSDGM)” http://www.fgdc.gov/metadata/contstan.html
- [HREF3] Elaine Wong, “Evaluating Geospatial Metadata Standard for Data Sharing in the Regional Municipality of Waterloo”, 2002
- http://www.fes.uwaterloo.ca/crs/student-archive/gp555/winter2002/eymwong/assignment2part2.pdf
- [HREF4] Clint Steel, “USGS CMG ‘Formal Metadata’ Definition”, 2004
- http://walrus.wr.usgs.gov/infobank/programs/html/definition/fmeta.html.