GISdevelopment.net --> Policy --> Geographic Information Infrastructure


The research on Metadata Management of Resource and Environment Spatial Database

Yanrong Cao, Jiantao Bi, Hongqiao Wu, Jianbang He
Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences , Beijing 100101 , P.R. China

Yuxia Huang
Institute of Remote Sensing Application , CAS , Beijing 100101 , P.R. China


The Resource and Environment Spatial Databases (RESD) have a mass of data, which is dramatically growing. But sharing of these data has to face so many problems: To manage and share data, metadata is becoming an important tool.

This paper discusses how to establish metadata management system of Resource and Environment Spatial Database.

Firstly, it develops architecture of metadata standard for the RESD, and gives the design method of Unified Modeling Language (UML) static diagram for metadata framework.

Secondly, it gives data dictionary for detail element definition. Through the dictionary, the architecture of metadata is being expressed.

And it uses extensible markup language(XML) to describe metadata. This paper gives a metadata DTD (Document Type Definition), discusses how to store XML files and how to query record from database. At last, it carries out metadata management and metadata’s expansibility.

It can help users to query effectively and find resources quickly from RESD.

1 Foreword
The RESD possess the databases of each side such as nature, society, economy and population etc crowd with the environment database. Not only the data bulk is huge, and divide the different hardware terraces of category, and possesses the structure and the content of different types by the different software supports. How solving the production, management and organization of data and sharing altogether is badly in need of the problem solved. As data producers, eagerly needs one set of effective data management and maintenance method, and the service that the same also requirement of data subscriber can be procured promptly from producer there and safety and effective and overall is so that following in resources and the environment data of magnanimity quickly and accurately find, visit, procure and use the data that need. But, how to share these information altogether on the network, and still existing several problems:
  1. The data are in a single day shared on the network altogether, and involve the problem of copyright;
  2. Before users of data require to use, firstly will understand his content of the spatial data of requirement , cover , quality , management, data producer, the method providing ,concerned information etc .
  3. For network speed, user does not have the metadata and directly downloads. It will waste time in the download.
Spatial metadata provide the settlement that the information was served to the whole problem and have provided a feasible method.

Generally thinks that the effect in the course studied the metadata in the spatial data warehouse to enjoy altogether must be started with from three aspects : 1 ) the standard of metadata ( Profile ) ;2 ) the metadata management system ( Spatial Metadata Management System, SMMS ) ;3 ) the information service that takes the metadata as the foundation to provide .

Stressing here discusses preceding two the problems , namely problem of metadata management aspect.

The geographical information metadata standard versions that has consulted and follow ISO/TC211 final is submitted to, as well as after the real requirement inside this item, the research group ( Research Group On Geographic Information Sharing, RGIS ) work carried on the space metadata standard of development item .Firstly, the design has been in progress to the structure of metadata by the general model building language UML of use , and has added with the corresponding metadata data dictionary , and has formed the standard that meets with ISO/TC211.Secondly, we have used XML and JAVA development tool of industry standard to develop the metadata management system on Internet of operating at the aspect of the establishment of software .

2 The design of metadata standard
The geography information metadata standards are continuously the research hot spots of international geography information society , and include that the United States organizations such as federation geography data committees ( FGDC ) , Europe geography information standardization committees ( CEN/ TC287 ) and international standard organization geography information committees ( ISO/TC211 ) etc all devote self to continuously the research of geography information metadata standards . At present, ISO/TC211 is as a result of near ten renewals of version, and has entered committee draft stage . Now our nation has also made a large number of efforts the standard aspect the research, has accomplished the nation geography information standard by the nation geography information foundation centers. Therefore nation resources and the environment database metadata standard is accomplished under international and premise that the national standard is soon issued.

Thinking over completeness, accuracy, structure and the compatibility of standard, and has consulted and follow the newest many international standard , national standard or the functional standards with the international standard , for example,ISO/TC211 was at the final drafts put forward of 8 months Final Text Of CD 19115 Geographic Information - Metadata in 2001 , and the geography information center on nation foundation is at metadata put forward in 2001 standards geography information numeral product metadata on foundation . Moreover, the spatial data of RESD are to give first place to with space information (vector, reflection and raster, etc.) and with the attribute data are given first place to and possesses the space locating information, as well as the non- spatial data collection such as books document archives datum catalogue warehouses and law regulations data bases etc. According to the concrete requirement, we have worked out the standard that is suitable.

2.1 Structure design of metadata
The standard includes seven kinds and three kinds public data.
Seven kinds include :
Identification, basic information about the data set;
Data Quality, the overall of data quality information;
Spatial Data Organization, the organization method of data set spatial information;
Spatial Reference, the description of the reference frame of the coordinate of data set as well as coded system;
Entity And Attribute , the details information about the data set;
Distribution, the publishing and procure information;
Metadata Reference, the metadata condition and the information of responsible department at present;

Three kinds of public data packs are drawn together :
Citation, the concise information when quoting and the reference data collection;
Date, the information of date and the time of concerned incident is provided
Contact, about individual or organization quoted in the major subset .
Public data not single uses of sheet , the object quoted as other elements.
The concerned standard that fits internationally and the nation is confined to the length , and is not given unnecessary details about at this the concrete content of concerned metadata again.
Owing to the metadata complicated logical organization and relation are existed between various big kind , if the analysis is come to the method used towards the object ,it includes the inheritance relations such as ( single and multi ) , Composition, Aggregation and Association etc . The public data are as quoting the object , and are frequently quoted by other kind again . If only describing the metadata , will very hard express clearly with the two dimension table in common use . Therefore needing to show with the means of diagram.


Figure 1 The class MD_metadata’s definition and containment relationships with the other metadata classes

The Unified Modeling Language is designed the graphic presentation method of structure as one kind towards the object , and is used by Object Management Group(OMG) and other organization as a standard . In the geography information series standard that ISO/ TC211 is working out includes the final draft put forward at present , and also all use UML generally as model language. Therefore we have used static structure figure in UML to express logical organization and the relation of each kind of metadata here. Class and attribute of class have been described with the data dictionary . Thus the complete metadata standard has been formed .This is also consistent with the international standard.

The relation between class of metadata standard includes Generalization ( matches towards the object inheritance ) , Aggregation, Composition, Association etc. .

In figure 1, MD_metadata is aggregated by MD_quality, MD_reference, MD_content and MD_spatialreference and follow system and MD_content expressions and MD_distribution, and this kind of relation is that association is got together to one kind of one-way . In the figure , the number represent multi . Such as is that 1..*, shows that the metadata has one or many identification to be known information between MD_metadata and the MD_identification . In addition , the important is still can realize the extension of metadata according to the new model element of the construction of UML Model Stereotypes above the model element of definition .

2.2 Data dictionary
The data dictionary has described the characteristic of the metadata of designed bye UML. It takes subset , substance and element as the unit , and has described the structural relation and the attribute of substance and element at this architecture . It possesses as follows the attribute : Name , short name , definition , obligation/condition , maxim occurrence , data type and domain .

The sum has formed the complete metadata according to the data dictionary with metadata’s static structure figure in UML, and has the distinct logical organization. It is easy to understanding , being easy to program to realize .

Table 1 Example of metadata for Resource and Environment Database data dictionary
  Name Definition Obligation/Condition Maxim Occurrence Data type Domain
1 Metadata_Identification Basic information required uniquely identify a resource or resources M 1 Aggregated class(MD_Metadata) Lines 1-55
2 Dataset Chinese Name Chinese Name of Dataset M 1 CharacterString Free Text
3 Dataset English Name English Name of Dataset O 1 CharacterString Free Text
4 Date Date of publish or update M 1 Integer YYYYMMDD

3 Metadata Management

3.1 What’s the problem?

Metadata structure as well as data dictionary clearly tell user how to describe the data base with the metadata with what UML described. All data sets can come to describe with the metadata. But how to manage the meta database , and how more effectively to help user gains the data, it need to have the good metadata management system .

The objective of metadata management is in order to procure , checks up , saves , deals with and to apply to the geographic metadata . Geographic information sharing is above the foundation of network, therefore the so-called management problem also is on the network, involving web browser, web server and metadata server, and relies on a series of requirement and the answer courses between the software parts.

At present international and internally all build many metadata systems. Such as, what it was recommendation of FGDC's is used to build the information interchange package of software of center (Clearinghouse) in space with I-Site's Freeware bundle by FGDC recommended . There has been such Web Site ( www.nsii.gov.cn ) in the National Geographic Information Exchange Center ( NGIEC ) , and user can inquire about that each node comes out well in a photograph the metadata of spatial information by way of the browser . There are other fairly more famous MetaStar's series by Blue Angel Technologies's development that has had commercialization , the Metadata DOCUMENT of ARC/INFO etc. . Analyzing these metadata management systems , can reaching them , the major merit can the module below all possessing :

The metadata browser : Being responsible for spatial data browsing and the navigation of database , and providing the query interface , as well as the data preview merit abilities

The metadata editor : Realize various editor's merit abilities of metadata , like builds and inserting and deleting and updating etc

The metadata server : Manage the metadata database , and publish metadata at the network.

Contacting the real application, and still should strictly think over following several problems when realizing the RESD to manage the system except realizing the mentioned above merit ability :
  1. In each kind relation that reaches them in the metadata static figure in UML should be shone upon distinctly in the organization of metadata, and has the good query strategy .
  2. Owing to the fact that the project involves each domain of nature, society , economy and population , the metadata standard can not contained all aspects . Therefore being only to take some important and public metadata substances (otherwise the substance in the metadata standard can become to such an extent that numerous and jumbled ,but gives rise to in really the application a large number of superfluous surplus ) . But, as for some domains , will need to expand the metadata ( must be according to fixed rule , and by way of conformance testing ) of self , otherwise can not effectively describe the data set. So the expandability of metadata is very important .
  3. Using for convenient , the metadata file of standard that can export and import is various fit . The model of the storage of metadata can be adopted the method that the document system storage combines DBMS each other.
  4. The user interface is convenient to operating.
3.2 XML Review

3.2.1 Mapping static figure in UML using XML

Thinking over the above requirement , using at present popular XML technology should be of course .Extensible Markup Language is one kind of Web's labeling language that continues after the HTML , and it is for user has provided nimble expansion mechanism , and makes that the labeling element of the ( Well-formed ) self-defining that the resources of different contents can be good with the format comes the show . In the essence, XML is one kind of meta-language, and is one kind of language that is used to describe other languages. It possesses the following characteristic: from descriptive, have ability self definition label (Tag), basing on the requirement of themselves developers can define Document Type Definition(DTD ); Half structure is fit for describing the hierarchy mould data; good extensibility; and the platform independence suits on the network to transmit . Therefore as for developing the metadata management system , XML can mapping various kinds relation of the metadata defined by UML completely , can expand the metadata , and can be satisfy with the requirement of network operating .

3.2.2 Use XML describe metadata
Using XML, we can describe the RESD metadata . Concrete work is defining DTD of the metadata standards. The definition of DTD part of following ( chiefly with the mark knows in the information one part to serve as the example , other with " ... " leaves out ).

<!-- RESD metadata’s DTD -->

<!ELEMENT metadata ( idinfo,dataqual?,continfo?,distrib?,spatrep?,refsystem ?, metainfo) >
<!—Identification, Data Quality, Content, Distribution, Spatial Data Organization, Spatial Reference, Metadata Reference -- >
<!-- Identification -->
<!ELEMENT idinfo(cn_name,en_name,date,version,purpose,status,geobox … …)>
<!ELEMENT chinesename (#PCDATA)>
<!ELEMENT englishname (#PCDATA)>
<!—Date described in timeinfo -- >
<!ELEMENT version (#PCDATA)>
<!ELEMENT purpose (#PCDATA)>
<! ELEMENT status EMPTY ) >
<!ATTLIST status
progress ( Complete | In work | Planned ) “Planned”
update (Continually|Daily|Weekly|Monthly|Annually|Unknown|As needed | Irregular | None planned ) “Unknown” >
<! ELEMENT geobox EMPTY >
<!ATTLIST geobox
westbc CDATA #REQUIRED
eastbc CDATA #REQUIRED
northbc CDATA #REQUIRED
southbc CDATA #REQUIRED>

The DTD can be used the accurate no mistake earth's surface of XML by after the definition and reach the metadata that UML described . If user need to expand the metadata , abides by the augment ability rule in the metadata standard , thus namely the interface that the slave system provides can self-defining DTD, achieves the requirement of metadata extension .

3.2.3 Storage and query of XML metadata
The metadata is the text file owing to what XML expressed, if it’s holding is only by the document system, the query efficiency of metadata will be low. But real XML database does not still appears at present. Such tactics can be used here: We should deposit XML files in the big object ( BLOB ) of binary system of database . It is very simple to do so the storage, but for query, it has lost one part efficiency. The Full Text Search of based on keyword can only be adopted. The main DBMS,SQL Server 2000 or Oracle 9i's all supports Full Text Search ( Full-text Index ) , and all can build index on the field parcel at BLOB.

3.3 Metadata management
The RESD need metadata server concentration in Chinese Academy of Sciences geography science and resources research institute, and the data are also concentrated there at the same time. Therefore the whole metadata management system did not adopt the spatial data exchanging center (Clearinghouse) method that is general internationally, but adopt the management model of concentration.


Figure 2 Metadata management system of Resource and Environment Spatial Database

3.4 Conclusion
Building a metadata management system is for realizing sharing of RESD. It has used UML design metadata standard for keeping consistent with ISO/TC211 . In storage and transfer it has used XML that can be extended and self-defined. These can provide the strong support for information sharing . But, there still has some work to remain to accomplish further:
  1. Metadata XML files were stored in BLOB field of database at present, and the query efficiency is not high , and that the method of query is not capable to work neatly . How to realize holding and the query of high efficiency rate, this needs to study the concerned document further, and makes some trials of software establishment aspect.
  2. The metadata management is centralization, in the future; it should to develop toward the distributing type. Needs to consult the successful experience of spatial data switching center , as well as studies the characteristic of advanced software such as Peer-to-Peer and Agent etc , and making the metadata manage more nimble , the convenience is prompt.
Reference
  • Yanrong Cao, Hongqiao Wu, Jiantao Bi. The Research of Metadata Management of National Resource and Environment Spatial Database. Geo-Information Science.2002
  • Jiantao Bi, Hongqiao Wu, Yanrong Cao, Yuxia Huang, Tainhe Chi, Jianbang He, Research on the Model and Method-Metadata in Resources & Environment Information System and theirs Iintegration, Geo-Information Science, 2002.6
  • ISO/TC211. Final Text of CD 19115 Geographic information – Metadata.2001
  • Jianbang He, Jingtong Jiang,Ruomei Liu. Research on Geographic information standardization , 1998(2), 8-12.
  • Yanmin Yao, Zuoqin Jiang, Tailai Yan. The research of territory resources information core metadata. Journal of Surveying and Mapping,2001,30(4),349-354.
  • Chaofan Dai, Qinbao Liu, Su Deng. Research of Code Using OIM XML. Engineering and Application of Computer ,2001(3),7-9.
  • Extensible Markup Language (XML) 1.0 (Second Edition),http://www.w3c.org
  • Metadata Tools for Geospatial Data, http://badger.state.wi.us
  • Xiaolin Wang, Yingwei Luo, Shengri Cong. Research of Spatial metadata. Research and Develop of Computer,2001,38(3),321-327.
  • Jingtong, Jiang Research of Geographic Information Metadata In China. Peking:Science Press,1999.1-116.
  • Lihong Su, Yuxia Huang. The Organization and Use of Resources and Environment Information System Metadata,Journal of Image and Graphic,2001.
  • how to set up a Clearinghouse Node,http://www.fgdc.gov
© GISdevelopment.net. All rights reserved.