Building Ontology of Place Name for Spatial Information RetrievalDongpo Deng dongpo@iis.sinica.edu.tw Institute of Information Science, Academia Sinica A frequently repeated factoid is that 80% of all digital data generated today includes geospatial reference. Even though digital maps and other cartographic products are directly geo-referenced with geographic coordinates, these geo-referenced data usually are limited to be used in experts or people who are trained. However, a large volume of the digital data that people are familiar with does not use coordinates but are geo-referred with place names and other plain descriptors of geographic objects and features, like address and postal codes. The place names are instinctive geospatial conception for people. The conceptions of places are complicated, diversified, ambiguous and multi-scaled geospatial objects. Therefore, there is a need to specify the place name to canonical and interchangeable geospatial knowledge. An ontology is a shared, formal conceptualization of a domain. The geospatial ontology is considered as a formal modeling of the geospatial world as this is experienced and conceptualized by non-experts. In this talk, the gazetteer of Taiwan coming from Computer Center of Academia Sinica has been encoded by GML (Geography Markup Language). The study builds two ontologies of place name. One is built by hyponymy of place-name’s feature class and the other is built by spatial relationship. The ontologies are represented by RDF (Resource Description Framework) and used for information retrieval and reasoning. With GML document and RDF of place name, the query engine basing on JENA API is capable of providing user suitable geospatial information by information reasoning. The studied results show how the ontologies of place name are important for next generation Web/Internet GIS. 1. Introduction It is a commonly quoted estimation that up to 80% of all digital data generated today includes geospatial reference. Some of these digital data, like digital maps and other cartographic products, are directly geo-referenced with geographic coordinates. A large volume of the available data, however, do not use coordinates but are indirectly geo-referenced with place names and other text plain text descriptors of geographic objects and features (Vögele and Schlieder, 2002; Hill, 2000). For example, the Flickr is a famous online photograph management and sharing application. According their most popular tags, 31.7% of the tags are place-name and 12.4% of the tags are related to space. Obviously, place name is important in people’s common practice. Although people instinctively will wish to use place name as part of their query, the imprecise and vagueness of place name is considered as a big challenge for data accessing. Often there may be an imprecise match between the query name and the names associated with candidate source of information (Arampatzis, 2006; Jones et al, 2001). There is therefore need for specifying place name to explicit semantic representation for data accessing. Geospatial domain is characterized by vagueness, especially in the semantic disambiguation of conceptions in the domain, which make defining universally accepted geo-ontology an onerous task (Agarwal, 2005). There are several reasons that why it is difficult to extract geospatial ontology. First, geographic objects are typically complex, and they will in every case have parts. An ontology of geographic objects must therefore contain a theory of part and whole, or mereology (Smith and Mark, 1998). The same geographic object could be made different definitions in different events, scales, and situations. Thus, the geographic domain often has specific issues regarding ontology primarily because of its unstructure. Then, a standard terminology is not prevalent within the geographic domain and is dependent on the context of use and the user. It causes confusion in specification of universally accepted entities, concepts, rules, relation, and semantics as the basis of a consensual ontology (Agarwal, 2005). While ontologies have been promoted as a mean to improve access to and sharing of existing geographical information resources (Smith and Mark 1998; Fonseca et al., 2000; Kuhn, 2001), to build the geospatial ontology is a definitely significant research work for geographic information retrieval. Digital gazetteers play many new roles in the information architecture for geo-locational access to information and data. For example, a place name in an everyday discourse can be used to identify location, to get (from computer systems) driving direction, or to find information about an area (Hill, 2000). Gazetteers can be seen as specialized GISes that are tailored to handle a number of specific tasks: (1) indirect geo-referencing; (2) vertical data integration; and (3) handling large data sets (Schlieder et al, 2001). Moreover, digital gazetteers can be defined as geospatial dictionaries of geographic names with the core components of (1) name (could have variant names also); (2) a location (coordinates representing a point, line, or areal location); (3) a type (selected from type scheme of categories for places/features). In analogy to the thesauri used to define thematic concepts, a controlled and structural vocabulary of place names is needed as a basis for spatial information retrieval. Gazetteer links the names of spatial objects to thematic concepts and to their spatial representations, the “geographic footprints” (Schlieder et al, 2001). A place can have multiple footprint representations (Hill, 2000): (1) of different type: a point, bounding box, and a polygon, for example; (2) from different sources; (3) for different time periods (for example, the extents of cities change through time); (4) suitable for varying resolutions (e.g., more detailed vs. more generalized). For accurate information retrieval, a polygon-based footprint representation probably would be an optimal solution. Using exact polygon footprint, full-scale GIS functionality could be applied to select all geographic objects within a given region, to determine neighboring polygons, or to perform other complex spatial queries (Schlieder et al, 2001). In this study, a ontology of place name will be worked out via geographic feature type of place name. We created RDF to implement ontology of place name through Jena API for RDF. With the RDF of place name, the SQL-like query can be operated in Web with GML document of place name. 2. Related Work In order to facilitate the linking of a place name to related information for geo-referenced digital library, Smith and Crane (2001) developed the toponym-disambiguation method to automatically identify place names. With the popularization of geographical information on the Web, place names become important geographical identifiers for tagging web resources. Volz et al. (2007) established an ontology model of place names via WordNet and the place names’ geographic features; the ontology was then used to disambiguate place names in text. Fu et al. (2005) considered that ontologies play a key role in Semantic Web research and reported how to develop ontologies of place name to support retrieval of documents that are considered to be spatially relevant to users’ queries. Vogele et al. (2003) presented an approach to an intuitive and user-friendly creation and application of spatial metadata that are used for spatial relevant reasoning. 3. Ontology of Place Name 3.1 Geographic Ontology Ontology, a philosophical tradition, studies the nature of being and existence which could be described via taxonomies for hierarchical classification (Guarino and Giaretta, 1995; Agarwal, 2005). In domain-specific and user-dependent view, ontology is a formal specification of a shared conceptualization of a domain of interest (Gruber, T. 1993). Furthermore, the “conceptualization” is explained as abstract model of some phenomenon in the world by having identified the relevant concepts of that phenomenon; the "formal” refers to the fact that the ontology should be machine-readable; and the "shared" refers to notion that on ontology captures consensual knowledge. While the geographic objects are often complicate, hierarchical, and diversified, geographic information science (GIScience) have to fuse ontology to specify geographic objects and phenomenon to canonical description of geographic knowledge domains. Moreover, geographic ontology should be the formal modeling of the geospatial world as this is experience and conceptualized by non-experts. There are two distinct approaches that applied ontology in GIScience (Kuhn, 2001):
Digital gazetteers play many new roles in the information architecture for geo-locational access to information and data. For example, a place name in an everyday discourse can be used to identify location, to get (from computer systems) driving direction, or to find information about an area (Hill, 2000). Gazetteers can be seen as specialized GISes that are tailored to handle a number of specific tasks: (1) indirect geo-referencing; (2) vertical data integration; and (3) handling large data sets (Schlieder et al, 2001). Moreover, digital gazetteers can be defined as geospatial dictionaries of geographic names with the core components of (1) name (could have variant names also); (2) a location (coordinates representing a point, line, or areal location); (3) a type (selected from type scheme of categories for places/features). In analogy to the thesauri used to define thematic concepts, a controlled and structural vocabulary of place names is needed as a basis for spatial information retrieval. Geography Markup Language (GML) is an XML based encoding standard for geographic information, as well as already has been used to store, exchange and model geospatial data. GML is designed for the Internet and directly embraces the ideas of interconnected and distributed information elements (Raper 1999; Lake, 1996, 2001, 2005). GML has three main roles with respect to geographic information. First as an encoding for the transport of geographic information from one system to another; second as a modeling language for describing geographic information types; and third as storage format for geographic information (Lake, 2005). Therefore, GML is designed for Geo-Web, which has ability to display and transport geographic information, as well as mashup the non-geographic information. Furthermore, Kolas et al. (2005) considered the goal of common geospatial ontology is similar to the creation of GML. In this study, the place name is encoded by GML. In GML application schema, place name has been considered as Feature Collection and included four properties: “boundedBy”, recording the bounding box of place name; “footprint”, a geometry property; “featureType”, describing the categories of place name; and “description”, shown as figure 1. Moreover, figure 2 shows the GML instance of place name in this study. ![]() ![]() 3.3 The ontology of Place name The XML representation of GML is standard communication between geospatial applications, but a base geospatial ontology extends its power with the significantly greater expressiveness of OWL and ability to link this data to knowledge outside the geospatial realm. This expands the overall usefulness of the geospatial data while enriching it with complementary information. In this study, RDF (Resource Description Framework) is used to represents information of place name on Web with machine-understandable syntax and semantics. Jena API for RDF, one of popular tool developed by Brian McBride of HP, can parse, create, and search RDF models. The ontology is built by place name’s type which is a category. ![]() 4. The Ontology of Place name for information retrieval The Jena API for RDF not only help users to create the RDF but also is used to query the exist RDF by RDQL (RDF Data Query Language). The following example is a RDQL for querying this study’s place name’s RDF. The query process is shown as Figure 4. If we queried a fourth level city name, Houng-Tou (??), the result shows Houng-Tou is a populated place, as well as a administrative area. 5. Conclusion In this study, we reported the experience of creating the ontology of place name serving as a specification of domain knowledge, as well as used the ontology of place-name to information retrieval. The results show the geographic ontology can to rid of ambiguous of geospatial data. It is a common situation that a place name refers to different places and a place has different names. The ontology of place name might be a useful solution to provide exact result in the Web application. However, the ontology of place name built by feature type might solve the terminology problem of place name, but doesn’t figure out the spatial nature of place name. In the next research work, we will expand the ontology of place-name to include spatial relationships. ![]() Reference
| ||
|
|