SPATIAL AND COLLATERAL DATA MINING FOR CRIME DETECTION AND ANALYSIS
LINK ANALYSIS
Link analysis is a powerful and important technique for discovering information in large, complex data sets. Link analysis is a data, among strategy for identifying “events” which occur together. An “event” in this sense can be any crime. The goal of link analysis is usually to find common indicators of an event so that the corresponding opportunity can be exploited. There are three general types of link analysis that arise frequently in majority of applications: Associations are groups of “events” that regularly occur together. For example, if the goal is to determine ways to detect and limit crimes related to burglaries, it might be important to discover the type of houses, their location in the city whether down town or out skirts etc which happen to be the targets. Associations are the simplest link relationships, and the easiest to discover. They are often used to suggest hypothesis for data mining: Sequential patterns that occur reliably can be used to formulate heuristics (rules of thumb): “we should pitch crime C to persons who are involved in A and B.” Stratification divide the crime locations which has spatial attribute and criminals which is a non spatial attribute into “strata” for some analytic purpose, in this case, to answer a question. Stratification can be used to perform link analysis by retrieving records from the spatial data store using the hypothesized links as the retrieval keys. For example, suppose that an analyst suspects that certain high level crimes are more likely to happen in a particular locality under a given police station limit. This hypothesis may be tested by retrieving lists from the data store.
- All high-level crimes in the city
- All high-level crimes with reference to a particular location under a given police station limit
- Number of thefts
- total number of crimes in a given police station limit
The number of crimes in list 2 divided by the number of crimes in list 1 is an empirical prior probability that a high-level crime occurs in a given area at a particular time. The number of clients in list 3 divided by the number of clients in list 4 is an empirical prior probability that a particular crime like theft will have greater scope in a given police station limit. Comparison of these two probabilities gives insight into the relative likelihood of crimes vis-à-vis spatial distribution level. This technique has the obvious disadvantage of requiring what might be an expensive retrieval, and requiring the analyst to suspect the presence of the link. A more sophisticated form of link analysis can be applied when the data are numeric. The correlation coefficient of all possible pairs of data items can be computed. If the values of two factors are highly correlated, an exploitable association may exist. These techniques have the advantage that they quantify the strength of the association (correlation coefficients close to zero show weak association), and does not require the analyst to guess which data items might be linked. The amount of computation required is proportional to the square of the number of data fields. This could be prohibitive if the data are multi-dimensional, or consist of many records. Applying this technique to triples, 4-tuples, etc., is problematic, due to the amount of computation required. For the computation of the statistics, it is required that the data be ranked in some kind of order, a procedure that is often (though not always) meaningful for nominal data. Once again, most elementary statistics texts will illustrate the procedure.

Fig : 1.Crime link weapon

Fig : 2. Spatial data analysis - Location link Crime :
Reports are the main asset of this project .These reports are giving valuable information regarding the our goal. Summary statistics , PolyNet Pedicter, nearest neighbor, decision tree, link analysis are the tools of preparing the reports..
Summary Statistics and regression :The Summary Statistics exploration engine provides basic statistics about your data, including means, standard deviations, and frequencies. In addition, the Summary Statistics report includes frequencies charts for each category, string, and yes/no variables

Fig : 3. Frequencies Chart giving statistics of weapons used.
The Predicted and Real vs. Counter graph allows you to see how closely the PolyAnalyst prediction follows the actual value of the attribute over the range of the dataset.

Fig 4 Decision Tree algorithm
The decision tree algorithm helps solving the task of classifying cases into multiple categories. Here the target attribute is CrimeType and using this target attribute Decision tree algorithm categories the dataset into six sub datasets. Here decision tree found dependence to crimes related to Gun ,Computer, Knife, Rifle, Phone . Link Analysis clearly gives the spatial distribution information of types of crimes that are occurring.
The central object of the Link Analysis report is the spatial linking and understanding. Display found positive and negative correlations (links) between attribute values (nodes) as a cyclical graph with directed links, the diagram allocates appropriate correlation weights to the links Red lines indicate positive correlations between values of attributes, while blue lines indicate negative correlations. The color intensity and weight of each line visually represents the strength of the association, where the thicker and darker lines have higher correlations.
Conclusions : The performed analysis serves as an illustration of some Crime detection techniques and some chart for getting valuable results. This result should help to our Police Department investigating officer to identify the hidden pattern without a need to know the local demographics and behavioral patterns. What is needed is only spatial distribution of police station limits. For this the tools of GIS are extremely beneficial. Moreover this analysis represent a much better overall picture of the incidents as it deals with both structural and textual portion of the database. The tools of data mining, GIS and data base management systems when integrated leads to spatial data mining. Though the tools are not based on well defined algorithms at present and are a bit fuzzy, the continued efforts and research on spatial data mining would certainly lead to improved management by police officers in course of time. The spatial data integration with collateral data and understanding the hidden patterns of past crimes through data mining helps in achieving the following:
- Improved crime resolution rate
- Better crime prediction and prevention of offences.
- Socio cultural analysis and strategies for prevention of crimes