GISdevelopment.net --> Application --> Miscellaneous

SPATIAL AND COLLATERAL DATA MINING FOR CRIME DETECTION AND ANALYSIS

KIRANMAI CHERUKURI
VNRVJ INSTITUTE OF ENGG. AND TECHNOLOGY,
HYDERABAD, INDIA.

MURALIKRISHNA IV
CENTRE FOR SPATIAL INFORMATION TECHNOLOGY,
JNTU, HYDERABAD, INDIA.

VENUGOPALA REDDY A
OSMANIA UNIVERSITY COLLEGE OF ENGINEERING,
HYDERABAD, INDIA



INTRODUCTION
Data mining techniques can help discovery and exploitation of knowledge, which can aid in many aspects of knowledge management. Information on knowledge falls into three categories:
  1. Knowledge about the past, which is stable, voluminous, and relatively accurate
  2. Knowledge about the present, which is unstable, compact, and relatively inaccurate, and
  3. Knowledge about the future, which is hypothetical.
Data mining, or Knowledge Discovery in Databases (KDD) in simple words is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It deals with the discovery of hidden knowledge, unexpected patterns and new rules from large Databases. Knowledge discovery in databases (KDD) is the process of identifying a valid, potentially, useful and ultimately understandable structure in data. . Spatial data mining methods are applied to extract interesting and regular knowledge from large spatial databases. In practice, data mining has two components: discovery and exploitation. During the discovery component, facts are discovered and represented as information-bearing data. During the exploitation component, these facts are applied to the solution of a specific problem. First, we discover; second, we act. The steps in the process are formulation of the problem, data evaluation, feature extraction and enhancement, prototyping and model evaluation. A simple taxonomy of knowledge discovery techniques looks like the following:
  • Manual search
  • OLAP
  • Knowledge Engineering
  • Visualization
  • Automated search
  • Auto-clustering
  • Link analysis
  • Regression (white box)
  • Rule Induction
Crime Detection is an area of vital importance in Police Department. Police Department of Hyderabad, a Metropolitan city in the state of Andhra Pradesh, India, has a large collection of information recorded by the officers at the time of a particular incident. Crime rate are rapidly changing and improved analysis enables discerning hidden patterns of crime, if any, without any explicit prior knowledge of these patterns. In this background, a study is planned as per the following objectives:
  • Extraction of crime patterns by analysis of available GIS based spatial data and attribute statistics,
  • Prediction of a crime based on the spatial distribution of existing data and anticipation of crime rate
  • Detection of a crime.
  • To discern trends to identify analytical solution for police officers which can routinely be used to associate between types of incidents, location, time and descriptive detail of the incident?
These approaches preprocess data to quickly generate relevant results, analyze patterns and co-occurrence of identified concept and develop an automated solution of crime pattern. At this stage let us evaluate the scope of data mining for such tasks as mentioned above under the objectives. Data mining relies on certain concepts like association or aggregation rules which in turn are dependent on grouping of birds of same feather.

DATA AGGREGATION

The term “aggregation” is used here as a neutral word for the mathematical term “set”. The mathematical methods developed for analyzing these coherent aggregations are referred to under the general term of “cluster analysis.” Most modern techniques are extensions of methods developed as part of mathematical statistics, though some have arisen out of data management techniques developed by computers scientists. Parametric statistics, such as mean and standard deviation, can also be used to describe clusters. Nonparametric statistics use more esoteric terms and methods. Nonparametric statistics are particularly useful when the number of data points in the population is small. For both types of techniques, the analyst builds a model of the data. The model is usually mathematical, and might be represented as a graph, table, or drawing. A model is an explanation of the data. If the model is good, it might be capable of supporting prediction about future data points. Cluster analysis can provide a foundation for predictive modeling. For example, to develop a model to identify persons likely to commit large crimes, common indicators or information patterns among separate fraudulent and petty criminals customer cluster would be sought out. Once these aggregations are located and formally delineated, data on prospective criminals can be examined to identify in which cluster they are likely for “membership.”

Police department of Hyderabad is an investigation report by Police officers about criminal offence and contains description information on crime type, time of incident, type of weapon and details of the incident. The key terms extracted include terms referring to age, gender, and physical description of suspect, time, location of the incidents, type of injury or death resulting from the incidents. Polyanalyst software gives a user-defined interface to build the application. The data mining techniques are used to generate reports, graphs and charts of historical data.

Spatial data:
  • Digital map of Hyderabad metro area with Municipal ward and Police station limit boundaries.
  • City map showing vital installations, VIP resident / movement areas / corridors
  • Choropleth map of number of crimes within each police station limits
Non spatial data / Attribute information from Police records:
  • CrimeID :Individual Crimes are designated by unique Crime IDs
  • CrimeName :Disguised crime’s name
  • Gender:Belongs to which gender.
  • Age:Age of Individual criminal.
  • Height:Hight of Individual Criminal.
  • Location:Location of Individual criminal .
  • CrimeType :It indicate particular criminal belong to which crime.
  • WeaponUsed :It indicate which type of weapon a criminal used.
The project components are:
  • Dataset
  • Rules
  • Graphs and Chart
  • Reports.
These four components are discussed in next few pages.

DataSet : The World dataset gives the structure of the data. Make sure to designate that Procedure codes should be imported as categories rather than numbers.

RULES: The following rules have been formulated: Location of crime:
This rule act like attribute in world dataset for identifying the spatial Pattern (Location or name of police station limit ) of crime
Duration_of_Crime: 2003 - year( firstCrime )
This rule act like attribute in world dataset for calculating duration of crime in crime world of individuals.
Age_of_firstcrime: Age - Durationofcrime
This rule act like a attribute in world dataset for calculating the age of the person when the first crime was committed.

Graph and Charts :

The following charts are obtained.
  • Crime link to weapon.
  • Location link to Crime.
LINK ANALYSIS

Link analysis is a powerful and important technique for discovering information in large, complex data sets. Link analysis is a data, among strategy for identifying “events” which occur together. An “event” in this sense can be any crime. The goal of link analysis is usually to find common indicators of an event so that the corresponding opportunity can be exploited. There are three general types of link analysis that arise frequently in majority of applications: Associations are groups of “events” that regularly occur together. For example, if the goal is to determine ways to detect and limit crimes related to burglaries, it might be important to discover the type of houses, their location in the city whether down town or out skirts etc which happen to be the targets. Associations are the simplest link relationships, and the easiest to discover. They are often used to suggest hypothesis for data mining: Sequential patterns that occur reliably can be used to formulate heuristics (rules of thumb): “we should pitch crime C to persons who are involved in A and B.” Stratification divide the crime locations which has spatial attribute and criminals which is a non spatial attribute into “strata” for some analytic purpose, in this case, to answer a question. Stratification can be used to perform link analysis by retrieving records from the spatial data store using the hypothesized links as the retrieval keys. For example, suppose that an analyst suspects that certain high level crimes are more likely to happen in a particular locality under a given police station limit. This hypothesis may be tested by retrieving lists from the data store.
  1. All high-level crimes in the city
  2. All high-level crimes with reference to a particular location under a given police station limit
  3. Number of thefts
  4. total number of crimes in a given police station limit
The number of crimes in list 2 divided by the number of crimes in list 1 is an empirical prior probability that a high-level crime occurs in a given area at a particular time. The number of clients in list 3 divided by the number of clients in list 4 is an empirical prior probability that a particular crime like theft will have greater scope in a given police station limit. Comparison of these two probabilities gives insight into the relative likelihood of crimes vis-à-vis spatial distribution level. This technique has the obvious disadvantage of requiring what might be an expensive retrieval, and requiring the analyst to suspect the presence of the link. A more sophisticated form of link analysis can be applied when the data are numeric. The correlation coefficient of all possible pairs of data items can be computed. If the values of two factors are highly correlated, an exploitable association may exist. These techniques have the advantage that they quantify the strength of the association (correlation coefficients close to zero show weak association), and does not require the analyst to guess which data items might be linked. The amount of computation required is proportional to the square of the number of data fields. This could be prohibitive if the data are multi-dimensional, or consist of many records. Applying this technique to triples, 4-tuples, etc., is problematic, due to the amount of computation required. For the computation of the statistics, it is required that the data be ranked in some kind of order, a procedure that is often (though not always) meaningful for nominal data. Once again, most elementary statistics texts will illustrate the procedure.


Fig : 1.Crime link weapon


Fig : 2. Spatial data analysis - Location link Crime :


Reports are the main asset of this project .These reports are giving valuable information regarding the our goal. Summary statistics , PolyNet Pedicter, nearest neighbor, decision tree, link analysis are the tools of preparing the reports.. Summary Statistics and regression :The Summary Statistics exploration engine provides basic statistics about your data, including means, standard deviations, and frequencies. In addition, the Summary Statistics report includes frequencies charts for each category, string, and yes/no variables

Fig : 3. Frequencies Chart giving statistics of weapons used.

The Predicted and Real vs. Counter graph allows you to see how closely the PolyAnalyst prediction follows the actual value of the attribute over the range of the dataset.

Fig 4 Decision Tree algorithm

The decision tree algorithm helps solving the task of classifying cases into multiple categories. Here the target attribute is CrimeType and using this target attribute Decision tree algorithm categories the dataset into six sub datasets. Here decision tree found dependence to crimes related to Gun ,Computer, Knife, Rifle, Phone . Link Analysis clearly gives the spatial distribution information of types of crimes that are occurring.

The central object of the Link Analysis report is the spatial linking and understanding. Display found positive and negative correlations (links) between attribute values (nodes) as a cyclical graph with directed links, the diagram allocates appropriate correlation weights to the links Red lines indicate positive correlations between values of attributes, while blue lines indicate negative correlations. The color intensity and weight of each line visually represents the strength of the association, where the thicker and darker lines have higher correlations.

Conclusions : The performed analysis serves as an illustration of some Crime detection techniques and some chart for getting valuable results. This result should help to our Police Department investigating officer to identify the hidden pattern without a need to know the local demographics and behavioral patterns. What is needed is only spatial distribution of police station limits. For this the tools of GIS are extremely beneficial. Moreover this analysis represent a much better overall picture of the incidents as it deals with both structural and textual portion of the database. The tools of data mining, GIS and data base management systems when integrated leads to spatial data mining. Though the tools are not based on well defined algorithms at present and are a bit fuzzy, the continued efforts and research on spatial data mining would certainly lead to improved management by police officers in course of time. The spatial data integration with collateral data and understanding the hidden patterns of past crimes through data mining helps in achieving the following:
  • Improved crime resolution rate
  • Better crime prediction and prevention of offences.
  • Socio cultural analysis and strategies for prevention of crimes
© GISdevelopment.net. All rights reserved.