Utilization of Graph Theory and GIS on Web-based Application for Optimal Route Determination

Deni Suwardhi, M. Lulu Lukman S. Hakim, Ahmad Riqqi, Dudi Sarifudin, and Oki Gumilar.
Spatial Information Science & Technology (SISTECH) Workshop
Remote Sensing and Spatial Information Science Research Group
Geodetic Department, Institute of Technology Bandung


Abstract
A real-world application of graph theory and GIS for analysis of an optimization problem is presented in this paper. The problem entailed determining the optimal route for trip in Institute Technology of Bandung campus as a case study. The trip could be started from one place to another place or from one place to some places and returns back to the initial place.

To determine the optimal path, geographic information systems (GIS) techniques were utilized for spatial data management, measurement of distances from place to place, and analysis of the routes. In order to illustrate the approach, we have developed the Web-based application by integrating our developed MapRoute application, Minnesota MapServer, Internet server and postgreSQL/postgis database into a Web based client/server environment. After the route is resulted from the MapRoute application then Mapserver is used for visualization.

The problem domain was modeled as a graph. In Graph Theory, the problems are so-called shortest path and traveling salesman. Djikstra algorithm was used for shortest path problem solving. For traveling salesman problem two algorithms were implemented. One is the heuristic and the other is the exhaustive. The heuristic algorithm does not always give the best solution but is very quick and low costing. Contrary to the heuristic algorithm, the exhaustive always gives the best solution but is very slow and resource consuming.

1. Introduction
There has been a huge growth in the research, development, and utilization of geographic information systems (GIS) recently. However, while GIS have been designed to handle most types of common spatial analysis problems, many of the more complex spatial problems are still beyond their current capabilities to solve. These types of problems often involve extremely large search spaces with correspondingly large numbers of potential solutions. One such problem within the realm of spatial analysis is that of the network analysis.

A network is a set of linear features that are interconnected in GIS. Common examples of networks include highways, railways, destination streets, rivers, transportation routes (e.g., transit, school buses, garbage collection, and mail delivery), and utility distribution systems (e.g., electridestination, telephone, water supply, and sewage). Network analysis is useful for organizations that manage or use networked facilities. Utilities employ network models to model and analyze their distribution systems and meter-reading routes. Municipal public works departments use networks to analyze bus and trash routes, whereas businesses use them to plan and optimize the delivery of goods and services. Network analysis can also be applied to retail store planning. For instance, solving of the driving times can aid in the determination of retail store trade areas. Three principal types of network analysis are network tracing, network routing and network allocation (Turk, T., and Gumusay, M. U., 2004).

Geographical data used in network analysis have to be vector structure and also based on line. A network refers to a system of lines topologically structured. Networks may be reduced to topological graphs, which are arrays of points connected or not connected to one another by lines. This simplification facilitates the revelation of common topological structures of the networks. The following elements may be identified: nodes (vertices, v1, v7), links (e1, e9), and regions (r1, r4). Connectivity matrices for these elements in binary form may be produced (Figure 1). The number of edges (links) in the network (e), the number of vertices (nodes) in the network (v), and the number of isolated (i.e., no connecting) networks (sub graphs) (g) are employed to develop a series of topological measures to characterize the network structure (Haggett and Chorley, 1969). It should be noted that an edge is defined by two nodes.


Figure 1. Connectivity matrices derived rom the Network
Graph and Reduction o a Network to a graph.
(Source: Haggett and Choreley, 1969)


For this paper, a problem was selected. The problem is to determine optimal route for distribution system like mails, logistics, or guest’s guidance in Institute Technology of Bandung (ITB) campus. The route may exist with starting and destination points are building, internal building (room to room), or any place in the campus such as main gate, parking area, plaza, etc. Persons who will be subjects in the route use car, motorcycle, bicycle or walking. Road, sidewalk, corridor and any path that connecting anyplace in the campus are linear features to form a network in this problem.

To determine the optimal closed path, geographic information systems (GIS) techniques were utilized for analysis of the routes and measurement of distances from building to building. The problem domain was modeled as a complete graph and translated into Shortest Path and Traveling Salesman Problem (TSP). An implementation of the TSP heuristic and TSP exhaustive method developed in Object Pascal code was utilized to solve the problem.

2. Spatial Network Data Model To understand the real world, the characteristic of real world features must be studied and modeled. In the real world features are described by some descriptive terms, by physical location and by their ability to connect both physically and logically to other features. Thus a “road feature” could be described by attributes should as name and length, by a geometry representing its physical location, by topology to represent how it is physically connected to other features and by join relationships to represent logical connections with other features. The modeling of attributes, geometry and relationships are standard fare for most GIS and many database systems (Chunithipaisan, et. al., 2002).

A network contains a collection of real-world features that may have connectivity to each other. In the real world there are campus road, sidewalk, corridor between buildings, and path inside a building such as corridor, stair and lift path. The shape of that network features are represented by geometry. For a 2D coordinate system there are three basic types of geometry: point (x, y pair), curve (series of connected x, y pairs) and polygon (series of connecting x, y pairs making up one or more complete rings). In our case, curve (line string) with 3D coordinate system is used for making up spatial network then the segments inside a building with different floor cannot intersect each other (see figure 4.b).

Based on topological graphs, there are three kinds of node: junction, terminal and isolated node. Junction node is a node connected to two or more nodes by paths. Terminal node is a node connected to only one node and isolated node is a node not connected to one another (Suwardhi, 1998). Intersections between network features and centered of parking area are junction node. Terminal node might be a door as an entry point to a room.

Based on the above a conceptual data model was developed. The OGC Feature Geometry Specifications (Open GIS Consortium, 1999) was adopted and adapted where necessary. Features have attribute data which is described by alphanumeric data types and treated as a physical property. Geometry and topology are used to describe the shape and continuity of features respectively and are treated as objects relating to a feature. A feature can also have a logical relationship to another feature, and this property is the relationship property. Figure 2 gives an overview of the conceptual data model.


Figure 2. Conceptual Data Model


Once the conceptual data model had been finalized, a physical model was implemented. PostgreSQL/ PostGIS object relational database was chosen to prove the conceptual model. There are three main tables (links, nodes, and turntable) created for storing spatial network data.

The links table contains the topological details of each of the line segments. The actual geometry of the segment is in the geometry item. Each links table contains line string geometry object representing the spatial location. Each spatially distinct segment end node is stored in node table as an identifier and a single point. Every link segment will have its from and to nodes mapped to an intersection in this table.

Finally, TurnTable is relations for storing data on expanded intersection representations. The turn table contains a tuple corresponding to each direction of travel through an intersection. An additional field maintains the travel cost associated with that direction of travel (or perhaps a pointer to a flow cost function). A reserved character (such as a negative number) can indicate a turn restriction.

The following is a Data Definition Language (DDL) to create the links, nodes, and turntable tables:



3. Graph Construction in Application
In order to store, analyze, and visualize the spatial network data, we make use of a diverse set of tools, components and products that are both Open Source and our Developed Software packages. The following integrated toolsets are used:
  • Windows/PostgreSQL 8.0.2 with the PostGIS spatial extension is used as the backend geospatial database for the spatial network data.
  • The Minnesota MapServer 4.0.2 is used as an application for visualization the result of network spatial analysis and map.
  • Developed CGI Maproute application. Tool for development this application is Borland Delphi.
  • Apache Web Server version 2.0.50
  • PHP version 4.3.7
  • Finally, end user queries and data extraction operations are performed with Web browser such as Firefox, Netscape, or Internet Explorer depending on user requirements

Figure 3 shows the configuration and connection between the applications.

The core of the system is a Common Gateway Interface (CGI) application that allows developers to rapidly build and deploy web applications based on GIS databases using simple text-based configuration and presentation templates. A CGI stand-alone application is a console application that receives client request information on standard input and passes the results back to the server on standard output. This data is evaluated by the CGI application, which creates appropriate request and response objects. Each request message is handled by a separate instance of the application.

Figure 3. System Configuration


The following are some of the main tasks of MapRoute CGI application:
  • Receiving request from webserver as well as all related information (starting coordinate, destinations coordinate, vehicle, etc)
  • Sending queries to the database and receiving query result. Reading the spatial network data from PostgreSQL/Postgis database.
  • Constructing internal Graph classes.
  • Performing network analysis, such as shortest path distance and traveling salesman problem.
  • Sending analysis result to Minnesota Mapserver for map visualization on the web browsers.


To support graph construction in the application, there are some mappings from class in Postgresql to classes in the Object Pascal programming language. The mapping enables us to store an Object Pascal class as an object directly into the database, as well as retrieving an object directly from the database. The mapping from Pascal classed to some of PostgreSQL Data Type is also defined to facilitate spatial based queries and analysis. The following are classes in Object Pascal code for mapping links, nodes and turntable tables in Database: Web-Browser




Figure 4. (a) Road network in ITB campus
(b) Network in Labtek IXC Building with 3D visualization
(c) Web Route application on web browser

4. Network Analysis

Shortest Path Analysis
The basic problem: Find the "best" way of getting from s to t where s and t are vertices/nodes in a graph. Dijkstra's Algorithm, introduced in 1959 provides one the most efficient algorithms for solving the shortest-path problem. In a weighted graph (otherwise known as a network), it is frequently desired to find the shortest path between two nodes. The weights attached to the edges can be used to represent quantities such as distances, costs or times. In general, if we wish to find the minimum distance from one given node of a network, called the source node, to all the nodes of the network, Dijkstra's algorithm is one of the most efficient techniques to implement. In general, the distance along a path is the sum of the weights of that path. The minimum distance from node a to b is the minimum of the distance of any path from node a to b (Suwardhi, D. 1998).
Complexity

The most worrisome part of Dijkstra's algorithm, in view of complexity, is the process of finding the cheapest node. The outer loop of the relaxation step takes ) (n O time to complete, since each node is extracted once. Each time a node is extracted, each outgoing edge must be considered as adjacent nodes are examined. Although we do not know how many edges will be considered during a single iteration of the loop, we do know that each edge will be visited once total, which gives complexity ) ( n m O +, where m is the number of edges and n is the number of nodes. The extraction of the cheapest node has still not been taken into account, however, and its cost depends largely on the data structure used to store the nodes. With linear storage, it would take time ) (n O to find the cheapest node, which results in a total cost of time ) ( 2 n O . Improvements can be made on this time by using an r-tree data structure to index the nodes, which allow extraction of the cheapest node in time )) (log(n O . In this way, the best complexity attainable for Dijkstra's algorithm is logarithmic )) log( ( n n m O +(Munir, R., 2001).

Traveling Salesman Problem
The TSP (Traveling Salesman Problem) is considered as one of the most difficult to solve problems. The situation it deals with is as follows: A campus postman has to visit n destinations, each one exactly once. He starts from campus post offices (base) and after visiting all the destinations, he returns back to his base. The transport cost (like distance) between any pair of destinations is known. Notice that it is not necessary that every two destinations are connected. We want to find the optimal path for his visits. Optimal path is the path with the smallest possible cost. This is a famous problem of the graph theory where the nodes of the graph represent the post office and destinations, and the edges represent the paths between them. When there is no edge connecting two nodes, it means that there is no way to reach one destination from the other. Here, it is assumed that only one path can connect two cities and no more (the graph is simple). The path is determined by Dijkstra’s shortest path algorithm which described above. For this problem we have been implemented two algorithms for TSP. One is the heuristic and the other is the exhaustive. The heuristic algorithm does not always give the best solution but is very quick and low costing. Contrary to the heuristic algorithm, the exhaustive always gives the best solution but is very slow and resource consuming. For example, for more than 20 destinations the exhaustive algorithm run on the modern computing system, will spend some centuries until it gives back the solution! In practice, for input n more than 15, the algorithm is very slow.

Heuristic Algorithm
This algorithm is based on the ‘greedy method’. As it has been mentioned earlier, this algorithm manages to give a fast solution rather than the optimal solution. In some cases, it is possible that it will give us a solution that is much worse than the optimal one. Its basic advantage is that it is very quick. The basic idea of the algorithm is as follows: it starts from the base and after calculating its distance with all its neighboring destinations, it selects the destination with the smallest distance. It then continues the same way with the next destination and all the other until they have been visited the entire destination. It should be emphasized here that because it has been assumed that a destination might not be connected with another, this algorithm may come across a dead end. In such a case, it finishes unsuccessfully. Let’s suppose that the postman has to visit n-1 destinations (n including base). For the first destination the algorithm makes (n-1) comparisons (one for each other destination). After selecting the second destination, it makes other (n-2) comparisons (as the already selected destination cannot be selected again) and so goes on until it is reached the (n-1) destination and the algorithm finishes. The complexity of the algorithm is:

The complexity is quadratic O(n2) (Munir, R., 2001).

Exhaustive Algorithm
Here the algorithm starts from the base and tries all the possible connection paths that exist. Their costs are calculated and the path with the smallest cost is the output. It is obvious that the number of the possible paths is the number of the permutations of the n destinations that is (n-1)! Also, for each of the (n-1)! comparisons 2n arithmetic operations are approximately made. The complexity of the algorithm is:

The complexity is factorial O(n!) (Munir, R., 2001).


Practical Results
Here are some practical results taken by the MapRoute program:



For Shortest Path computation, there are 3699 Nodes and 4432 Edges in the Network.

It is clear that for the heuristic algorithm, the loops and time values remain in low levels. In contrast, for the exhaustive algorithm they increase very quickly as the input n value increases. For values of N over 8 the exhaustive algorithm is extremely slow.

5. Conclusion
This paper provided a brief presentation of spatial data model specifically oriented to network analysis applications. We reviewed the foundations of network representations and discussed the node-arc‘s model and its implementation using the object relational data model. We discussed one algorithm for shortest path and two algorithms for Traveling Salesman Problem with their complexity. The test shows that the heuristic method is effective in terms of computation time and complexity.

Further efforts will be made on the following items:
  • Enhancing the data model into real transportation data model, such as linear referencing systems (Miller and Shaw, 2002).
  • Adding others shortest path and TSP algorithms, such as genetic algorithm.
  • Expanding the applications of the algorithm into broader GIS topics.


References:

  • Chunithipaisan, S., James, P., and Parker,D. (2002). The integration of spatial datasets for network analysis operation. In Proceedings of MapAsia 2002 Conference. Bangkok, Thailand.
  • Haggett, P., and Chorley, R. J. (1969). Network analysis in geography. London, Edward Arnold.
  • Miller, H., and Shaw, S.-L. (2001). Ch 3 GIS-T Data Models. In H. Miller, & S.-L. Shaw (Eds.),
  • Geographic Information Systems for Transportation. Principles and Applications (pp. 53-62). New York: Oxford University Press.
  • Munir, R. (2001). Matematika Diskrit. Bandung. CV.Informatika.
  • Open GIS Consortium. (1999). The OpenGIS Abstract Specification- Feature Geometry. http://www.opengis.org/.
  • Suwardhi, D. (1998). Topology and Graph Theory for Spatial Network Modeling. Master Thesis. Informatics Department, Institute Technology Bandung.
  • Turk, T., and Gumusay,M.U. (2004). GIS Design and Application for Tourism. Proceeding In XXth ISPRS Congress, Istanbul, Turkey.

Page 1 of 1