Online Rail Tack Detection Using GIS and Remote Sensing Method with Data Mining
Manoj Kumar Jain
Tata Consultancy Services
New Delhi
System Engineer
E-mail :statistician_jain@yahoo.co.in
Abstract:
Throughout the developed country substantial investment have been made in building road and railway infrastructures. Now there is immense pressure from rail users and public for better management of the existing rail network and associated information. But now a days some anti social elements are disturbing rail track networks. As a result, responsible agencies at local, district, state and national levels are looking to adopt sophisticated IT systems. GIS offers a unique ability to process data based on the geographical context and allows data to be efficiently and effectively summarized for vertical integration within an organization. GIS and Remote Sensing is one of the fastest growing technologies of present time. It has emerged as powerful and sophisticated means to manage vast amounts of geographic data. It provides a mechanism by which information on location; spatial interaction and geographic relationship of various facilities can be assessed and viewed in moments. It provides an opportunity to effectively view and access geographic data and thus to improve decision-making process. This paper summarizes the attempt to discuss, how the GIS and Remote Sensing method is applied for online rail track detection.
1. Introduction
A Geographic Information System (GIS) is a tool that uses the power of the computer to pose and answer geographic questions. The user guides the program to arrange and display data about places on the planet in a variety of ways - including maps, charts and tables. The hardware and software allows the users to see and interact with data in new ways by blending electronic maps and databases to generate color-coded displays. Users can zoom in and out of maps freely; add layers of new data, and study detail and relationships.
Remote sensing is the science and art of obtaining information about a phenomenon without being in contact with it. Remote sensing deals with the detection and measurement of phenomena with devices sensitive to electromagnetic energy such as: Light (cameras and scanners) Heat (thermal scanners) Radio Waves (radar).
GIS and remote sensing both are correlated to each other. Without remote sensing, GIS is nothing because GIS used Remote sensing data. In GIS, there are two uses of use of remote sensing data; as classified data and as image data.
Use of classified Data: Rail track cover maps classified from remote sensing data can be overlaid onto other geographic data, which enables analysis for environmental monitoring and its change.


Fig1: An Estimation of Spatial distribution of population using LANDSAT TM Data
Above example shows a case study in which statistical data with lower spatial resolution are reallocated with a higher spatial resolution using the fact that the remotely sensed data have higher resolution than the statistical data.
Use of image data: Remote-sensing data will be classified or analyzed with other geographic data to obtain a higher accuracy of classification
Above example shows a comparison between two results of classification without the use of map data and with the use of map data. If ground height and slope gradient are given as map data, rice fields, for e.g., can be checked and located only in flat and low land areas. Forest areas and mangrove area are also classified with less error if map data are combined with remote sensing data.
Image data are sometimes also used as image maps, with an overlay of political boundaries, roads, railways etc. Such an image map can be successfully used for visual interpretation.
If a digital elevation model (DEM) is used with remote sensing data, shading corrections in mountainous areas can be made by dividing by cos q (where q: angle between sun light and the normal to the sloping surface).
1.1. Concept of Remote Sensing
Remote Sensing is defined as the science and technology by which the characteristics of objects of interest can be identified, measured or analyzed the characteristics without direct contact.
Electro-magnetic radiation, which is reflected or emitted from an object, is the usual source of remote sensing data. However any media such as gravity or magnetic fields can be utilized in remote sensing. A device to detect the electro-magnetic radiation reflected or emitted from an object is called a "remote sensor" or "sensor". Cameras or scanners are examples of remote sensors. A vehicle to carry the sensor is called a "platform". Aircraft or satellites are used as platforms.
The technical term "remote sensing" was first used in the United States in the 1960's, and encompassed photo grammetry, photo-interpretation, photo-geology etc. Since Landsat-1, the first earth observation satellite was launched in 1972; remote sensing has become widely used.
The characteristics of an object can be determined, using reflected or emitted electro-magnetic radiation, from the object. That is, "each object has a unique and different characteristics of reflection or emission if the type of deject or the environmental

(a) Classification without using map data

(b) Classification with use of map data
Fig 2: Land use Classification with auxiliary use of map data
condition is different." Remote sensing is a technology to identify and understand the object or the environmental condition through the uniqueness of the reflection or emission.
This concept is illustrated in figure 3 while figure 4 shows the flow of remote sensing, where three different objects are measured by a sensor in a limited number of bands with respect to their, electro-magnetic characteristics after various factors have affected the signal. The remote sensing data will be processed automatically by computer and/or manually interpreted by humans, and finally utilized in agriculture, land use, forestry, geology, hydrology, oceanography, meteorology, environment etc

Fig3: Data Collection by remote sensing

Fg4: Flow of Remote Sensing
2. Geographic Information System (GIS):
A GIS is a computer system capable of capturing, storing, analyzing, and displaying geographically referenced information; that is, data identified according to location. Practitioners also define a GIS as including the procedures, operating personnel, and spatial data that go into the system.
The power of a GIS comes from the ability to relate different information in a spatial context and to reach a conclusion about this relationship. Most of the information we have about our world contains a location reference, placing that information at some point on the globe. When rainfall information is collected, it is important to know where the rainfall is located. Using a location reference system, such as longitude and latitude, and perhaps elevation, does this. Comparing the rainfall information with other information, such as the location of marshes across the landscape, may show that certain marshes receive little rainfall. This fact may indicate that these marshes are likely to dry up, and this inference can help us make the most appropriate decisions about how humans should interact with the marsh. A GIS, therefore, can reveal important new information that leads to better decision-making.
A GIS makes it possible to link, or integrate, information that is difficult to associate through any other means. Thus, a GIS can use combinations of mapped variables to build and analyze new variables

Fig5: Data integration is the linking of information in different forms through a GIS.
For example, using GIS technology, it is possible to combine agricultural records with hydrography data to determine which streams will carry certain levels of fertilizer runoff. Agricultural records can indicate how much pesticide has been applied to a parcel of land. Locating these parcels and intersecting them with streams can use the GIS used to predict the amount of nutrient runoff in each stream. Then as streams converge, the total loads can be calculated downstream where the stream enters a lake.
A GIS can also convert existing digital information, which may not yet be in map form, into forms it can recognize and use. digital satellite images can be analyzed to produce a map of digital information about land use and land cover. Likewise, census or hydrologic tabular data can be converted to a map like form and serve as layers of thematic information in a GIS.
Future of GIS
Environmental studies, geography, geology, planning, business marketing, and other disciplines have benefited from GIS tools and methods. Together with cartography, remote sensing, global positioning systems, photogrammetry, and geography, the GIS have evolved into a discipline with its own research base known as geographic information sciences. An active GIS market has resulted in lower costs and continual improvements in GIS hardware, software, and data. These developments will lead to a much wider application of the technology throughout government, business, and industry.
GIS and related technology will help analyze large datasets, allowing a better understanding of terrestrial processes and human activities to improve economic vitality and environmental quality.
2.1. GIS and Remote Sensing
(A) GIS in remote sensing
for the users of remote sensing, it is not sufficient to display only the results obtained from image processing. For example, to detect land cover change in an area is not enough, because the final goal would be to analyze the cause of change or to evaluate the impact of change. Therefore the result should be overlaid on maps of transportation facilities and land use zoning as shown in Figure 6:

Fig6: An overlay of RS data and map data
In addition, the classification of remote sensing imagery will become more accurate if the auxiliary data contained in maps are combined with the image data. In order to promote the integration of remote sensing and geographic data, geographic information system (GIS) should be established in which both the image and graphic data are stored in a digital form, retrieved conditionally, overlaid on each other and evaluated with the use of a model.

Fig7: Shows a comparison between the computers assisted GIS and the conventional analog use of maps.
(B) Function of GIS
The following three functions are very important in GIS.
- To store and manage geographic information comprehensively and effectively
- To display geographic information depending on the purpose of use
- To execute query, analysis and evaluation of geographic information effectively
3. Data Mining
An information extraction activity whose goal is to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis With an enormous amount of data stored in databases and data warehouses, it is increasingly important to develop powerful tools for analysis of such data and mining interesting knowledge from it. Data mining is a process of inferring knowledge from such huge data. Data Mining has three major components Clustering or Classification, Association Rules and Sequence Analysis.
By simple definition, in classification/clustering we analyze a set of data and generate a set of grouping rules, which can be used to classify future data. For example, one may classify diseases and provide the symptoms, which describe each class or subclass. This has much in common with traditional work in statistics and machine learning. However, there are important new issues, which arise because of the sheer size of the data. One of the important problems in data mining is the Classification-rule learning, which involves finding rules that partition given data into predefined classes. In the data-mining domain where millions of records and a large number of attributes are involved, the execution time of existing algorithms can become prohibitive, particularly in interactive applications.
An association rule is a rule, which implies certain association relationships among a set of objects in a database. In this process we discover a set of association rules at multiple levels of abstraction from the relevant set(s) of data in a database. For example, one may discover a set of symptoms often occurring together with certain kinds of diseases and further study the reasons behind them. Since finding interesting association rules in databases may disclose some useful patterns for decision support, selective marketing, financial forecast, medical diagnosis, and many other applications, it has attracted a lot of attention in recent data mining research. Mining association rules may require iterative scanning of large transaction or relational databases which is quite costly in processing. Therefore, efficient mining of association rules in transaction and/or relational databases has been studied substantially.
In sequential Analysis, we seek to discover patterns that occur in sequence. This deals with data that appear in separate transactions (as opposed to data that appear in the same transaction in the case of association). For example: If a shopper buys item A in the first week of the month, then she/he buys item B in the second week etc. There are many algorithms proposed that try to address the above aspects of data mining. Compiling a list of all algorithms suggested/used for these problems is an arduous task. I have thus limited the focus of this report to list only some of the algorithms that have had better success than the others.
3.1. Data Mining Techniques For Disaster Detection
A number of data mining techniques can be used in disaster detection, each with its own particular advantage. The following categorization lists some of the techniques and the purposes for which they may be employed. In later sections, further discussion is provided for techniques investigated within the DDDM (Disaster Detection for Data Mining) framework
Characterization /Generalization: Produces a general description of the nature of the data. It can be used for deviation analysis if adequate difference is present in data content.
Classification: Creates a categorization of data records. It could be used to detect individual attacks, but as described by previous experiments in the literature, it is prone to produce a high false alarm rate. This problem may be alleviated applying fine-tuning techniques such as boosting.
Association: Describes relationships within data records. Detection of irregularities may occur when many records exhibit previously unseen relationships.
Frequent Episodes: Describes relationships in the data stream by recognizing records that occur together. For example, a disaster may produce a very typical sequence of records (similar to system call traces used in the Computer Immunology project).
Clustering: Groups records that exhibit similar characteristics according to some pre-defined metrics. It can be used for general analysis similar to classification, or for detecting outliers that may or may not represent disasters.
Incremental Updates: Keeps rule sets up-to-date to better reflect the current regularities in the data.
Meta-Rules: Provides a description of changes within rule sets over time. It can be used for trend analysis. It is probably not very useful for detecting individual disaster, but can serve as a base of comparison when new, unusual rules appear. For example, if a trend shows a steady increase in some network activity over time, then a sudden increase in another type of activity may be suspicious, but not necessarily an increase in the former.
4. Procedure of Online detection
Electro-magnetic radiation is a carrier of electro-magnetic energy by transmitting the oscillation of the electro-magnetic field through space or matter. The transmission of electro-magnetic radiation is derived from the Maxwell equations.
Where c = 3 x 108 m s-1 (speed of light) ? = wavelength f = frequency
All matter reflects, absorbs, penetrates and emits electro-magnetic radiation in a unique way. For example, the reason why a leaf looks green is that the chlorophyll absorbs blue and red spectra and reflects the green spectrum. The unique characteristics of matter are called spectral characteristics.
The rail track made of iron. The iron has the some special characteristics of absorption, transmission and radiation of electromagnetic rays. Using these properties, sensor records the data and show rail track information on GIS imagery.
Now we extract the rail track pattern from GIS imagery for every area and checked rail track pattern are continuous or discrete. We found track pattern is discrete then we check which location shows the discontinuity. After that we found intrusion on rail track. Data Mining is only used for making the association rule, sequential analysis to find out rail track on GIS imagery.
5. Proposed Model

Fig8: Architecture model for Online Rail Track Detection
Fig 6 shows a schematics diagram of our proposed model. Our proposed model is designed to satisfy the requirement of Online Rail track detection using GIS and Remote Sensing Method with Data Mining Approach.
Satellite data are coming from the networks. It is captured at SDC. It is an interface capable of capturing information flowing from (such as network card on a machine) satellite.
The raw data storage store collected network data. Typically, it is a set of hard drives where an application dumps information passing through the SDC, usually according to some requirement.
The Pre-processor handles the conversion of raw image or connection data image into a format that mining algorithms utilize and may store the result in the knowledge base. It can perform a range of duties, such as additional filtering, noise elimination, and include third party detection tool that recognize known disaster pattern of track.
The knowledge base stores rule produced by mining and any additional information used in the mining process. It may also hold the information for the per-processor, such as patterns for recognizing attack and conversion templates.
The profiler is responsible for generating snapshot rule sets to be used for deviation analysis. It can be triggered automatically based on time of day or the amount of pre-processed data available.
The deviation analyzer examines rule sets in the knowledge base and creates a description of difference by meta-learning. The results are stored in the knowledge base for further reference. If necessary, it signals the alarm generator. A strategy for invoking the deviation analyzer could be periodic queries to the knowledge base for the availability of new profile. Alternatively, the profiler may signal the analyzer when new profile is deposited to the knowledge base.
The alarm generator is responsible for notifying the administrator when the deviation analyzer reports unusual pattern in the current imagery.
5.1. Algorithm
- Input: vector Image data or Raster Image Data
- Output: Find deviation(disaster) on rail track
- Read GIS Imagery.
- Search Rail Track pattern on imagery for Area [I].
- Compare with older rail pattern to current rail pattern on Area [I].
- If find the deviation then intrusion occur.
- Find particular location where intrusion occurs on rail track.
- Generate alarm and share the information to Agent.
- Repeat the step 4 -8 for another Area [J]. Where J>= I.
Note: Current rail track pattern means rail pattern fond on current GIS Imagery.
Older rail track pattern means rail pattern found on old GIS imagery.
6. Conclusion
In this paper we proposed a systemic framework of network architecture that employs Online Rail track detection using GIS and Remote Sensing Method with Data Mining Approach. This framework consists of classification, association rules, and frequencies episodes programs, which can be used to (automatically) construct detection models. The accuracy of the detection models depends on sufficient training data set and the right feature set. We proposed that the association rules and frequent episodes algorithms can be used to compute the consistent patterns from audit data. These frequent patterns form an abstract summary of an audit trail, and therefore can be used to: guide the audit data gathering process which provide help for feature selection; and discover patterns of disaster of rail track.
Reference:
- Ester M., Frommelt A., Kriegel H.-P., Sander J.: Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support, accepted for Special Issue on: "Integration of Data Mining with Database Technology, Data Mining and Knowledge Discovery, an International Journal, Kluwer Academic Publishers, Vol. 4, 2000, pp. 193-216.
- Ester M., Gundlach S., Kriegel H.-P., Sander J.: Database Primitives for Spatial Data Mining, Proc. 8. GI-Fachtagung Datenbanksysteme in Büro, Technik und Wissenschaft (BTW'99) (Int. Conf. on Databases in Office, Engineering and Science), Freiburg, Germany, 1999, pp. 137-150.
- Ester M., Kriegel H.-P., Sander J.: Spatial Data Mining: A Database Approach, Proc. 5th Int. Symposium on Large Spatial Databases (SSD'97), Berlin, Germany, 1997, pp. 47-66.
- Ester M., Gundlach S., Kriegel H.-P., Sander J.: Database Primitives for Spatial Data Mining, Proc. 8. GI-Fachtagung Datenbanksysteme in Büro, Technik und Wissenschaft (BTW'99) (Int. Conf. on Databases in Office, Engineering and Science), Freiburg, Germany, 1999, pp. 137-150.
- Ester M., Kriegel H.-P., Sander J.: Knowledge Discovery in Spatial Databases, invited paper at 23rd German Conf. on Artificial Intelligence (KI '99), Bonn, Germany, in: Lecture Notes in Computer Science, Vol. 1701, 1999, pp. 61-74.