Bus Travel Time Prediction Using GPS Data

Mr. Y. Ramakrishna
Student, Department of Civil Engineering
IIT Madras
Chennai
India.
yrkemc2@yahoo.com
Mr. P. Ramakrishna
Student, Department of Civil Engineering
IIT Madras
Chennai
India.
prk82@yahoo.co.uk
Mr. V.Lakshmanan
Project Associate
Department of Civil Engineering
IIT Madras
Chennai
India.
iamlaksh1@gmail.com
Dr. R. Sivanandan
Associate Professor
Department of Civil Engineering
IIT Madras
Chennai
India.
rsiva@iitm.ac.in
ABSTRACT
Intelligent Transportation Systems (ITS) are gaining popularity in developing countries like India. One area of ITS application is the Advanced Public Transportation Systems (APTS), which employ advanced technologies to enhance the efficiency and safety of public transport systems. A key technology central to APTS is the Global Positioning System (GPS). Using GPS data accurate travel time predictions can be made, which will facilitate estimation of bus arrival times. Providing real-time bus arrival information will enhance the credibility of the transit system, thereby leading to higher patronage. This will enhance its competitiveness among various transportation modes. In this paper, a Multiple Linear Regression (MLR) model and an Artificial Neural Network (ANN) model for prediction of bus travel times using GPS-based data are developed. These models are applied to a case study bus route in Chennai city, and the results are presented.
INTRODUCTION
Bus arrival times at bus stops in an urban traffic environment is highly stochastic and time-dependant. This is due to random fluctuations in travel demands and interruptions caused by traffic control devices, incidents, and weather conditions. Providing real-time bus arrival information would enhance the credibility of the public transit system and thus render it more competitive among various transportation modes. One important parameter in travel time prediction model is travel time data. Travel time data can be obtained through various traffic surveillance devices such as loop detectors, microwave detectors, radars, etc. With the emergence of Global Positioning System (GPS) technologies, traffic data collection can be performed more efficiently and safely.
In the context of ITS, Advanced Public Transportation Systems (APTS) and Advanced Traveler Information Systems (ATIS) are designed to collect, process, and disseminate real-time information to transit users and motorists. One of the key elements in APTS/ATIS is prediction of vehicle arrival times. A technology central to several functions of APTS is the GPS. Advances in mobile computing have made possible transmission of GPS data via Global System for Mobile communication (GSM) in real time. Combination of GPS and Geographic Information System (GIS) helps in visualization of vehicular movements.
Research studies have been carried out for many years in short-term travel time prediction using different types of data, such as detector data, probe vehicle data, magnetic ticket data and others. Different methodologies are proposed for predicting travel times, ranging from simple statistical approaches to much sophisticated artificial intelligence and machine learning based algorithms. This paper presents two models, namely, a Multiple Linear Regression (MLR) model and an Artificial Neural Network (ANN) model to predict the arrival times of buses at target bus stops from the current bus stop. A brief review of relevant literature, data collection, case study, model development and evaluation, and conclusions are presented in the following sections.
LITERATURE REVIEW
In this section, a brief review of some of the previous work by different researchers in the area of travel time prediction is presented.
Lin and Zeng (1999) developed mathematical algorithm to provide real-time bus arrival information for Blacksburg, Virginia, USA. In this study, GPS unit in the probe vehicle sent the positional data to control station where it was stored in a database. Bus location data, schedule information, the difference between schedule and actual arrival time, and waiting time at time-check stops were used as main input in developing the model. The algorithm was primarily developed for a rural traveler information system, and the test bed was in a rural area where congestion was minimum or did not exist.
Jeong and Rilett (2003) considered traffic congestion, schedule adherence and dwell times at stops as input for bus travel time prediction. The test bed used was a bus route located in downtown Houston, Texas, USA. A historical data based model, regression models, and ANN models were used to predict bus arrival times. In this study, the historic data based model gave superior results, as compared to the multiple linear regression model. However, the ANN model was the best among the models.
Chien and Kuchipudi (2003) developed a travel time prediction model with real-time data and historic data. The Kalman filtering algorithm was employed here because of its ability to continuously update the state variable with changing observations. Results reveal that during peak hours, the historic path-based data used for travel-time prediction are better than link-based data due to smaller travel-time variance and larger sample size.
Chein et al. (2002) developed an Artificial Neural Network model to predict dynamic bus arrival time. Back-propagation algorithm was used. However, due to its long learning process, it is hard to apply on-line. Consequently, an adjustment factor to modify travel time prediction with new input of real-time data was developed. In this study, dwell time and scheduled data were not considered.
The above review is intended only to highlight a cross-section of efforts to predict travel times and is by no means exhaustive. The study reported in this paper predicts the bus arrival times at bus stops using GPS data through MLR and ANN. A case study of a busy bus route in Chennai city has been considered. The following sections describe the data collection system, case study, model development, evaluation and conclusion.
DATA COLLECTION
Figure 1 shows the data collection system adopted in this study. A GPS receiver is interfaced with a GSM modem and placed in the probe vehicle (bus). The GPS receiver and GSM modem are powered by a 12 V battery pack. The GSM modem uses the standard SIM card as in mobile phones. GPS receiver coupled with GSM modem onboard the probe vehicle record point locations in latitude-longitude pairs, speeds of the probe vehicle, altitude, date, time and its validity at pre-specified time interval. The GPS receiver is designed in such way that it automatically sends this information as text of predefined format to modem connected to it. GSM modem, in turn, sends this text message in the form of SMS to the remote computer in the control room where it is stored in a database.

Fig. 1. Data Collection System
CASE STUDY
Chennai Metropolitan Area (CMA) extends over 1177 sq. km. and had a population of over 7 million in 2001. Chennai City, which is at the core, covers 172 sq. km. and had a population of over 4 million in 2001. The road network in CMA is of radial and circular/orbital pattern. The development of the city is mainly oriented along these radial and circular roads. The economic base of Chennai city is a mixed one with small-scale industries and commercial activities distributed over the city. The city has now emerged as one of the important hubs of India due to establishment of major businesses and concentration of information technology activities.
There are three public transit systems currently operating in Chennai - bus, suburban rail and mass rapid transit systems. The bus system is operated by Metropolitan Transport Corporation (MTC). This system has a fleet strength around 2700 buses and operates on over 480 routes. It caters to over 3.2 million trips per day. One of the busy bus routes of the system is the route from Tambaram to Parrys (Route 21G), shown in Fig. 2. It spans a distance of 32 km and traverses different land uses along its route.
The GPS-GSM data collection system described above was placed within the bus on route 21G to serve as a probe. Probe data were collected for twenty-five trips. The positions, date, time and speed of the probe vehicle was sent every thirty seconds in the form of an SMS to the remote control room. This data was stored in a database for analysis. The data for all the trips made during different times in was segregated according to date and time. These were used for developing the models.
DEVELOPMENT OF MODELS
In this study, two models – a Multiple Linear Regression Model (MLR) and an Artificial Neural Network (ANN) model were developed for predicting the bus arrival times at bus stops. The following were chosen as input variables for the model development:
- Remaining distance from the current bus stop to the target bus stop in km (Dij)
- Remaining number of bus stops from the current bus stop to the target bus stop (BSij)
- Average speed from the origin of the route to the current bus stop in km/h (Soi)
- Bus stop dwell times from the origin of the route to the current bus stop in minutes (DWoi)
- Remaining number of intersections from the current bus stop to the target bus stop (Iij)
- Intersection delays from the origin of the route to the current bus stop in minutes (IDoi)
The output variable was the travel time from the current bus stop to the target bus stop in minutes (TTij).
For the prediction of bus arrival times, runs from Parrys to Tambaram of case study route were analyzed. There are 19 bus-stops (including both origin and destination bus stops) in this direction. All the bus stops and intersections were identified by plotting the entire set of data points on a digitized route map. Time stamp of data points on or near (i.e., before the bus stop in the traveling direction) to bus stops are considered as arrival times at the bus stops.

Multiple Linear Regression Model (MLR)
Multiple Linear Regression (MLR) model was developed using Statistical Product and Service Solutions (SPSS ver 13.0) software (SPSS Inc., 2004).
Various MLR models were tested with different combinations of variables using 1056 data points. The variables Dij, BSij, Iij have a high correlation with the dependent variable, TTij. The variables BSij and Iij were combined to form a new variable as ‘Remaining number of bus stops and intersections (BSIij). The correlation of the variables DWoi and IDoi to the dependent variable was negligible and hence they were dropped.
The best model developed is as follows:
TTij = 1.632 + 0.892Dij – 0.022* Soi + 1.030 BSIij
Where,
TT
ij = Travel time from the current bus stop ‘i’ to the target bus-stop ‘j’ (minutes)
D
ij = Remaining distance from the current bus stop ‘i’ to the target bus stop ‘j’ (km)
S
oi = Average speed from the trip origin to the current bus-stop ‘i’ (km/h)
BSI
ij = Remaining number of bus stops and intersections from the current bus stop ‘i’ to target bus stop ‘j’
The R-Square value of the above model is 0.946, which indicates a very good model fit.
Artificial Neural Network (ANN) Model
The ANN architecture used in this paper had three layers: an input layer, a hidden layer and an output layer. The weights and parameters associated with the hidden layer were identified during the training process. ANN model was built by a computer program in MATLAB (The MathWorks Inc., Ver. 6.1) software package.
Similar to MLR model development, various combinations of input variables were tried. Partial data sets were used for training the network and for selecting the network architecture (number of inputs n, number of hidden neurons q, number of outputs p). It is found that a single hidden layer with four hidden neurons gives the best model (n=3, q=4, p=1). The model was validated using a separate set of data points.
MODEL EVALUATION
In order to evaluate the performance of the two models, the Mean Absolute Percentage Error (MAPE) (given by equation below) was used as measure of closeness between predicted and observed values.

Where,
y
p = Predicted travel time from current bus stop to target bus stop
y
ob = Observed travel time from current bus stop to target bus stop
n = Number of validation data points
Essentially, MAPE represents the average percentage difference between the observed value (here, observed travel time from a current bus stop to a target bus stop) and the predicted value. The MAPE values for the two models are shown below in Table 3.
Table 3: MAPE of Two Models

It is found that Artificial Neural Network model performs better than Multiple Linear Regression model. This is believed to be due to ANN model’s ability to capture the complex non-linear relationship between travel time and the independent variables. It is to be noted that the models presented above are preliminary efforts. Further refinements and more case studies are planned to improve the models for superior prediction results.
It is also believed that both the models’ accuracies can be improved by using higher time resolution of GPS data (say, 5 second intervals). This will be helpful in improving the estimation of arrival times at the bus stops. The accuracy of GPS is another factor. Use of DGPS will provide higher locational accuracy and can lead to better performance of the models.
CONCLUSIONS
Advanced GPS-GSM technologies provide a good means to collect real time traffic data. The integrated data collection system requires minimal manpower. Moreover, the results obtained are more reliable as they require less human intervention and hence reduce the chances of errors in data collection and analysis. This paper presents an application of GPS data to predict bus travel times using two different models - a Multiple Linear Regression (MLR) model and an Artificial Neural Networks (ANN) model. Several relevant input variables were tested and the best model reported.
For the case study bus route in Chennai considered in the study, the ANN model produced better results than the MLR model. The closeness statistic (MAPE) worked out to be 22.7 and 16.0 for MLR and ANN models respectively. Further refinements and more case studies are planned to improve the models for superior prediction results.
REFERENCES
- Chien, S.I.J., Ding, Y. and Wei, C. (2002). “Dynamic Bus Arrival Time Prediction with Artificial Neural Networks.” Journal of Transportation Engineering, Volume 128, Number 5, pp. 429-438.
- Chien, S.I.J. and Kuchipudi, C.M. (2003). “Dynamic Travel Time Prediction with Real-Time and Historic Data.” Journal of Transportation Engineering, Volume 129, Number 6, pp. 608-616.
- Jeong, R., and Rilett. L. (2004). “The Prediction of Bus Arrival Time using AVL data.” 83rd Annual General Meeting, Transportation Research Board, National Research Council, Washington D.C., USA.
- Lin, W.H. and Zeng, J. (1999). “Experimental Study of Real-Time Bus Arrival Time Prediction with GPS Data.” Transportation Research Record, 1666, pp. 101-109.
- SPSS Inc. (2004). SPSS 13.0, Chicago, Illinois, USA.
- The Mathworks Inc, MATLAB Ver 6.1 software, Massachusetts, USA.