The integration of spatial datasets for network analysis operations
Sanphet Chunithipaisan1, Philip James2, David Parker3 Department of Geomatics, University of Newcastle upon Tyne, Newcastle, UK, NE1 7 RU 1 PhD Student, Sanphet.Chunithipaisan@ncl.ac.uk, 2 Lecturer, Philip.James@ncl.ac.uk, 3 Professor, David.Parker@ncl.ac.uk,
One of the most important current challenges for Geographic Information Systems (GIS) is the generation of corporate geo-spatial resources whose full potential can only be realised by making them accessible to a large number of applications and end-users [25]. In the field of facilities management, such as gas, electricity, water, telecommunications and transportation companies, spatial network GIS could provide a useful graphical interface and geographical database for the management of network assets and flows [11]. Utility networks typically impact on many people over vast areas and are generally managed by government departments, large organizations or companies. There is often little collaboration between the organisations despite similarities of interest and in some cases new legal requirements to share data with other utilities to minimise the impact of repairs and new build on both the public and the environment [24]. Thus, there is a growing need both to share the basic network information and in some cases to integrate data sets to carry out more complex network analysis operations.
In the real world, objects are connected to each other: thus an optical cable is connected to a multiplexer that in turn is connected to copper cables connecting into our homes to provide cable TV, telephony and internet access. Using GIS in support of network utility management typically involves many types of features that may have connectivity to each other. Several GIS vendors have developed GIS software whose potential functions can provide for network management and analyses [7], but each system has a proprietary format to deal with the connectivity between geometry or features. Topology in GIS is generally defined as the spatial relationship between such connecting or adjacent features [1,5], and is an essential prerequisite for many spatial operations such as network analysis [23]. There are, in general, three advantages of incorporating topology in GIS databases: data management, data correction and spatial analysis [13]. Topology structures provide an automated way to handle digitising and editing errors, and enable advanced spatial analyses such as adjacency, connectivity and containment [2]. In some systems this relationship is assumed (by the user) whereas in others it makes up part of the structure of the geometry [1,22]. In some systems topology can be built where all arcs intersect or touch [4] and in others rules defining connectivity between feature types can be used to build topology [6]. The latter approach is typically used in the major GIS especially for network utility applications. For example, ArcInfo has a “rule base” to specify the connectivity of the features whereas GE Smallworld has a “manifold” to describe how features connect. In addition, the network may be a directional network depending on the application in question. Each GIS designed for network utility applications has different alternatives to manage the issue of a directional network. Some systems have a feature that is part of the data structure such as a “Turn” to deal with directional links in the network [4], whereas in some systems, directional links in the network can be specified in the application by through code [7]. The ability to reuse existing data is a benefit that new applications should be able to take advantage of [21]. This is often not possible because of problems with data integration due to proprietary data formats. Attempts have been made to integrate formats using standards such as GML [16] and tools such as FME [18]. There is however a particular problem in network GIS in that topology is not exchanged in general import/export (other than that assumed by the geometry). Some systems do not support topology or network analysis functions at all, and yet these non-topological datasets still contain valuable data. In order for network analysis to be carried out the current options are to import data into the tool of choice, coercing data into the required format. If further changes are made to the original dataset then the process needs to be repeated. However, data conversion across systems is not straightforward and is similarly time consuming. Whilst systems exist that handle network topological issues in a structured and efficient manner, these tend to be high cost systems and it is not usually possible, nor desirable to upgrade from an existing system to one that manages topology. Likewise, a one off import of all data into such a system to carry out basic network analysis functions is not a practical solution. Furthermore, data concerning the same feature type may be maintained in different systems. To distinguish the duplicated features when converting to the new database or importing to the new network analysis application is a difficult process. Even at the semantic level, inconsistencies in definition cause problems: for example, a feature, such as a road, may be labelled differently (e.g. as a street) in a different system. This paper reports on an on-going research project entitled “The development of generic, topology-aware spatial datasets and models”. This research has been undertaken to address and solve those problems mentioned above. The research framework comprises three main parts: The first stage is to design a model to incorporate data from various systems and to model attributes, geometry and topology so as to be able to carry out network analysis. Several models were developed to describe real world features and connectivity of features, including the defining rules of connectivity between features. The second stage of the research is to design analytical tools and other tools to manage the data. Several tools are being developed to test the conceptual model designed from the first stage and to support network topology and network analyses. This stage also includes investigation into mechanisms for data integration and dealing with data redundancy. The final stage is to develop an application that can be served across public and private networks to carry out network analysis “on-the-fly". This paper focuses on the first stage where the concepts underpinning the conceptual data model are introduced. A limited implementation of the application for the purpose of testing the data model is also introduced. Paper Structure Firstly this paper identifies the research overview and motivation. The overarching concept of the research is then introduced followed by the methodology used. The data model and structures are then discussed. There is an analysis of the implementation thus far and finally some conclusions are drawn on the suitability of the current data model and avenues for further development are presented. Vision Most major GIS support relational databases in some form, and often it underpins the data structure. Data import/export from and to a relational database is relatively straightforward using built-in functionality, or using macros or scripts to connect to a relational database. Furthermore, ODBC [15] or similar tools for connecting to external data sources are available for most platforms. The widespread support of SQL within relational databases also provides a structured and common interface for addition/update and deletion. Distribution of applications and data via the World Wide Web and associated technologies is clearly becoming a major trend [12]. Many web GIS applications have been developed over the last few years [8,9], but these applications typically provide only basic GIS functions. The vision of the research is to enable a “web” based application that allows the transferral, where appropriate, from proprietary datasets, whether topological or not, into a generic relational database. Topology is then created and rules defined to allow network analyses to take place over all the required data. Where appropriate data can remain in an existing format, but topology added by setting a semantic schema for the names of features. Figure 1 gives an overview of the process. ![]() Figure 1 The vision of the research Research Methodology This research first investigated real-world features and their spatial and aspatial properties. The geometry and topology were treated as a property of the feature. The network connectivity model was designed including the network connectivity rule. A relational database structure was designed to fit the conceptual model and implemented to collect the data come from various dataset. The application was implemented using the Java programming language, and JDBC [20] was used for connecting to databases. Several tools were created to support the application, especially for network analysis. The tool for building topology was created and tested to build topology based on the connectivity rules. The network analysis functions - network tracing and shortest path – were developed that incorporated analysis of directed networks. Data Model To understand the real world, the characteristic of real world features must be studied and modelled. In the real world features are described by some descriptive terms, by physical location and by their ability to connect both physically and logically to other features. Thus a “road feature” could be described by aspatial attributes should as name and length, by a geometry representing its physical location, by topology to represent how it is physically connected to other features and by join relationships to represent logical connections with other features. The modelling of attributes, geometry and relationships are standard fare for most GIS and many database systems, so the following section concentrates on the modelling of rule based topological structures to facilitate network analysis functions. The real world feature Phenomena in the real world can be modelled as real world features. The real world is composed of many kinds of real-world features. The characteristics of a feature can be represented by its properties. There are four kinds of feature properties: physical, relational, geometrical, and topological. A physical property is usually alphanumeric data stored as a number or text. Relationship properties are used for representing the logical relationship between features. The shape of a feature is represented by a geometry. For a 2D coordinate system there are three basic types of geometry: point (x, y pair), chain (series of connected x, y pairs) and area (series of connecting x, y pairs making up one or more complete rings). Topology is commonly used to describe the physical connectivity between features. ![]() Figure 2 Real world phenomena Network Connectivity
Based on the above a conceptual data model was developed. An object-oriented design concept was followed as it allows one to develop models that more closely resembles the real world [19]. The ISO Spatial Schema [10] and the OGC Feature Geometry Specifications [17] were adopted and adapted where necessary. Features have attribute data which is described by alphanumeric data types and treated as a physical property. Geometry and topology are used to describe the shape and continuity of features respectively and are treated as objects relating to a feature. A feature can also have a logical relationship to another feature, and this property is the relationship property. The “Family” contains a collection of features that may connect. Features can be associated with one or more families. Figure 7 gives an overview of the conceptual data model. ![]() Figure 7. Conceptual Data Model Implementation Database Model Once the conceptual data model had been finalised, a physical model was implemented. A simple relational database such as Microsoft Access was chosen to prove the conceptual model. There are six main tables, created for storing spatial data that link to the spatial object. The Family table is used to store the relationship of network connectivity of each family. ![]() Figure 8. Logical Relational Database Model Application development Application development took place in several phases. The first phase was the development of mechanisms for the export of data from a variety of sources and formats. Thus a tool was developed to export data from ArcView via ODBC using SQL. GE Smallworld Magik application language was used to develop a similar system using COM (Component Object Modelling - a Microsoft software architecture) [14] and SQL for export from GE Smallworld VMDS (Version Managed Data Store). The application to display, query and modify the relational database was developed in Java. Several tools were created for supporting the application: data query, manipulation, display and analysis functions. ![]() Figure 9. Application Model In the main application JDBC was chosen to facilitate database connection and to extract data into the application. The JDBC-ODBC bridge and an ODBC driver, Type 1 JDBC driver [20], was used for the database connection. Many basic functions were developed in the application tools: selection, intersection, snapping, transformation, and so on. The GUI was developed from AWT and Swing packages provided as part of the Java platform. The tool to build the topology based on the connectivity rules defined in the family and stored in the database was developed. The network analysis functions were also created. There are two basic functions for network analysis: network trace, and shortest path. The Dijkstra [3] algorithm was used for both network analysis functions. The GUI of the application is shown in the Figure 10.
The implementation has been tested both with simulated data and a network of transport data extracted from a sample GE Smallworld dataset of the Cambridge area. The network connectivity rule in the “road” manifold was transferred to the network “road” family. The topology of the network family was built up, and then the network analysis was tested. Conclusion This paper describes a conceptual generic data model for the handling of spatial and topological feature data. The limited implementation that has been completed has demonstrated the validity of the model using a variety of test data sets from several proprietary formats. The network analysis functions implemented so far have proved the validity of the data model in respect to the handling of topological analysis for directional and non-directional networks. The concept of the network “family” of features presents a user-friendly interface to the assignment of network connectivity and overcomes several issues that face integration of topological and non-topological datasets. Future work will focus on data integration, and an implementation with extended data sets. The mechanisms for checking data duplication (where the same feature is stored in more than one original dataset) will be investigated. Several scenarios of network applications will be tested. Finally more work is required to serve the final application in an “application server” environment running over a private or public network using web technologies. References
| ||
|
|