7.6 Press Release - To be submitted September 2010
On track and looking to the future
May of this year saw the Odyssey Consortium members come together in Brussels for the eagerly anticipated Midterm review of the project. The team met with both external experts and the European Commission staff.
The objective of the review was to analyse the work of the project and redirect its focus and intervention where necessary. Partners illustrated the journey of the Odyssey project so far with lively presentations and technical demonstrations.
Partners were delighted to show the reviewers that all the deliverables set for this review time had been completed, with the project is well on target to fulfil its goal of developing an interoperable platform to share crime data across Europe.
With so much of the Odyssey partnership work undertaken virtually by Skype, phone and email, the review was also a great opportunity for partners to meet each other face to face again and consolidate future plans and strategies.
8. Submitted Papers 8.1 A Pan European Platform for Combating Organized Crime and Terrorism (Odyssey Platform)
Babak Akhgar, Simeon Yates, Fazilatur Rahman, Lukasz Jopek, Sarah Johnson Mitchell and Luca Caldarelli
b.akhgar@shu.ac.uk, s.yates@shu.ac.uk, f.rahman@shu.ac.uk, l.jopek@shu.ac.uk, s.j.mitchell@shu.ac.uk, l.caldarelli@shu.ac.uk
Sheffield Hallam University, S1 1WB Sheffield, UK
Abstract: Combating organized crime requires evidence matching and visualization of criminal networks using advanced data mining capabilities. Current approaches only aim to generate static criminal networks rather addressing the issues with the evolution and prediction of the networks, which are inherently dynamic. In this paper we will report on our ongoing research in advanced data mining tools and semantic knowledge extraction techniques. The research of combined semantics and data mining outcomes is the foundation for a platform that captures information that is hidden in the data, and produces applied knowledge.
Keywords: data mining; semantics; ballistic intelligence; crime; knowledge extraction.
1. Introduction
Globalisation has been accompanied by a dramatic increase in organised and trans-national crime and terrorism. It takes many forms and includes homicide, genocide, honour killings trafficking in drugs, weapons, smuggling of human beings and the laundering of the proceeds of crime.
The objective of the Odyssey Project is to develop a Strategic Pan-European Ballistics Intelligence Platform for Combating Organised Crime and Terrorism. Odyssey is an EU funded project that will develop a secure interoperable situation awareness platform for an automated management, processing, sharing, analysis and use of ballistics data and crime information to combat organized crime and terrorism.
The Project will focus on ballistics data and crime information, but the concept and the Platform will be equally applicable to other forensic data sets including DNA and fingerprints.
The Odyssey Project is co funded by the European Commission, the project partners are, Sheffield Hallam University (United Kingdom), Atos Origin (Spain), Forensic Pathways Ltd. (United Kingdom), EUROPOL - European Police Organisation (Netherlands), XLAB (Slovenia), MIP - Consorzio Per L'innovazione Nella Gestione Delle Imprese E Della Pubblica Amministrazione (Italy), West Midlands Police Force Intelligence (United Kingdom), Royal Military School (Belgium), An Garda Siochana Police Forensics Service (Republic of Ireland), SAS Software Ltd. (United Kingdom) and DAC - Servizio Polizia Scientifica (Italy).
This paper discusses the Odyssey platform as a standard-setting tool that attempts to catalogue, process and exploit information on ballistics and crime. The platform is being developed to allow police forces across Europe to better perform national and international investigation activities: in addition, it aims also to generate an automated "Red Flag" alerting system signalling potential criminal activity on the basis of the information stored and processed by the platform components. The global system entity is a central European repository (CEUR), which holds the necessary software and hardware infrastructure for data processing and exchange. Local authorities connect to CEUR to query available data and to insert new crime data. Each local police force will maintain a local database on crime/ballistic information. In order to connect to CEUR, they must have the Query and Input Data Components, enabling them to access the repository by using standardized Graphic User Interface and data. Furthermore, police forces must have a Security Component, which manages the secure login. An Authorized Sharing Component, which strips data of all sensitive information which should not be shared across national borders will also be utilised. In turn, the platform consists of a series of components which include the following aspects: Security (Global Security, Global Authorization), Data Components (Query, Query Storage, Semantic Data, and Database Management), Processing Components (Data Mining, Relationship Discovery), Modelling (Model Management) and Alerting (Alert Generation).
The main contribution of this paper is in Section III and IV, where the use of Semantic Modelling and of Data Mining applications in the framework of the project purposes is described and visually represented.
2. Background
2.1. Semantic Technologies and Data Mining
Data mining technologies are a foundation for data analysis and lead to understanding of vast amounts of information. In order to analyse large bodies of data and extract relevant knowledge, the Platform requires use of methods, tools and algorithms that are efficient and can be provided at low cost. These include expert’s time and system resources.
Roots of data mining are found on the basis of estimation, classification and clustering, and sampling theory. However other methods such as the construction of decision trees, neural networks will be also considered within the Odyssey Project. Generally, data mining (sometimes called data or knowledge discovery) is a process of analyzing data from different perspectives and summarising it into information that can be used to increase revenue, cut costs, or both. Data mining software allows users to analyze data from many different dimensions. It allows users to categorize, and identify relationships within the data. Technically, data mining can be described as the process of finding correlations or patterns among fields in large databases. Existing data mining applications supporting variety of systems for mainframe, client/server, and PC platforms are limited by the size of the database as well as query complexity in terms of number of queries being processed imposes burden on the ability of the system.
2.2. Comparison of Semantic Modelling and Data Mining
Semantic technologies and data mining techniques are aimed at retrieval of required information. Semantic modelling techniques focus on representing data using formal structure that enables logic reasoning and inference of knowledge. Moreover, data mining techniques rely on the use of algorithms to retrieve knowledge from the data as shown in the figure below.
Figure 1: Semantic Technology vs. Data Mining
As a result, semantic technology pushes the level of complexity high on the efficient representation of data and data mining techniques impose high complexity on the efficiency of the extraction algorithms regarding huge volumes of the unstructured data. Finally, a balance between the two approaches is reached in order to achieve the most promising results.
2.3. Research Issues in Criminal Data Mining
Large scale data mining projects in the law enforcement sector are increasing in both inside and outside of academia (de Bruin et al., 2006). COPLINK is a police & university collaboration using entity extraction and social network analysis from narrative reports, FLINTS & FINCEN aims to find links between crimes and criminals in money laundering cases. Clustering techniques and self organizing maps have been widely used for behavioural modelling as well as criminals' career analysis etc. using multidimensional clustering algorithms.
The biggest challenge in data mining is how to convert crime information into a data-mining problem. (Nath, 2006) foresees crime terminology cluster as a group of crimes in a geographical region compared to data mining terminology cluster is a group of similar data points and thus considers a one-to-one correspondence between crime patterns and then the next challenge is to find the variables providing the best clustering.
Considering the fact that the crime analysts create knowledge from information daily, by analyzing and generalizing current criminal records, Coplink creates an underlying structure called "concept space" which is an automatic thesaurus, a statistics-based, algorithmic technique to identify relationships between objects of interest that consists of a network of terms and weighted associations as well as co-occurrence analysis is done by similarity and clustering functions and thus network-like concept space holds all possible associations between objects (Hauck et al., 2002).
A general framework for crime data mining should enable traditional data mining techniques (association analysis, classification and prediction, cluster analysis, and outlier analysis identify patterns in structured data) as well as advanced techniques to identify patterns from both structured and unstructured data for local law enforcement, national and international security applications (Chen et al., 2006).
2.4. Research on (Ballistic) Intelligence Systems
Ballistics Intelligence primarily supports the crime detection through a creation of semantic knowledge-bases modelled on the data coming from the distributed sources. One of such projects is GRASP (Global Retrieval, Access and information System for Property items). It addresses the problem of sharing information by demonstrating how descriptions of objects can be captured, stored in a heterogeneous database, and widely distributed across a network environment. The project is specifically dedicated to the museums, police forces, insurance companies, and art trading institutions, which are faced with the problem of the identification of stolen and recovered objects of art and have difficulties in sharing relevant information. The Ballistic Intelligence Information system will modernize the semantic and data processing solutions introduced in GRASP at least in four areas:
-
It will define European standard for Semantic based Ballistic investigation which will be later formalized as ontology (Smith, Welty & McGuinness, 2004). GRASP system contained only an already developed Arts and Architecture Thesaurus with one root and 3 sub-concepts and almost no relations. The ontology was used to describe the artefacts in the terms of their appearance. Moreover, the Ballistic ontology will not only include rich structural domain knowledge model based on the newly developed standards, but also the additional meta-information guiding the knowledge mining;
-
The Ballistic system will apply knowledge mining techniques to discover potential links and correlations between various crime-related parameters;
-
The multilingualism will be handled in the easier, automated way. In the GRASP the languages are expressed not intuitively, by a complex transformation into integer values followed by latter mapping back into actual words depending on the local language settings. In the Ballistic platform this transformation will be resolved using Protégé Ontology Editor that will allow an easy specification of the translations of all concepts used in the user interface, dynamically adjusting the presence of the defined entities to the selected language (Standford University, 2009);
-
The mining of the correlations and associations in the data will be significantly faster as it will use the Odyssey semantic model. The technological challenges similar to Ballistic have been also present in the eJustice project (IST-2002-001567), which deals with the European identification and authentication issues, with emphasis on face and fingerprint biometry. The investigations made during the eJustice project can be also examined in order to extract the possibly useful information about the security policies used in sharing government data bases and protecting citizens’ privacy. This will be investigated and potentially applied to combine data mining (stage-mining) process of the project.
3. Data Mining
3.1. Algorithms
Data mining algorithms has been used in the past to support information needs to detect criminal networks (Chen et al., 2004).
Entity extraction techniques provide basic information such as personal identification data, addresses, vehicles, and personal characteristics from police narrative reports comprising multimedia documents (text, image, audio, video etc.) for further crime analysis, but its performance depends greatly on the availability of extensive amounts of clean input data
Clustering techniques may be used to identify suspects who conduct crimes in similar ways or distinguish between groups belonging to different gangs. But, crime analysis using clustering is limited by the high computational intensity typically required.
Association rule mining discovers frequently occurring item sets in a database and presents the patterns as rules. This technique may be applied to network intruders’ profiles to help detect potential future network attacks.
Sequential pattern mining finds frequently occurring sequences of items over a set of transactions that occurred at different times (time-sampled data). It must work on rich and highly structured data to obtain meaningful results.
Deviation detection techniques may be applied to fraud detection, network intrusion detection, and other crime analyses. However, such activities can sometimes appear to be normal, making it difficult to identify unusual/criminal activities.
Classification techniques find common properties among different crime entities and organize them into predefined classes. This technique has been used to identify and predict crime trends, reduce the time required to identify crime entities. But it requires training and the testing of data to maintain prediction accuracy.
String comparator approach can be used to analyze textual data but at the expense of intensive computation.
Social network analysis may be used to predict a criminal network illustrating criminals’ roles, the flow of tangible and intangible goods and information, and associations among crime related entities.
The Odyssey data mining component will research and test the applicability of above mentioned algorithms to support Odyssey use cases scenario and provide further innovation on performance improvement of data mining algorithms to process ballistic and crime information.
3.2. Tools
Over the years, SAS has built a strong track record on data and text mining methodologies and systems and is now classified among the best-performing analytic software developers worldwide (Gartner, 2008). Within the Odyssey project consortium SAS are a partner and a series of SAS products can be successfully implemented in order to maximise the effectiveness of TAOC (Terrorism And Organized Crime) analysis and the, European police forces’ fight against Terrorism and Organised Crime (TAOC).
The core of the SAS data mining product portfolio is represented by SAS Enterprise MinerTM, i.e. the SAS solution streamlining the data mining process to create a highly accurate predictive and descriptive model, With the ability to process large amounts of data for business decision purposes, SAS Enterprise MinerTM can perform a series of operations which fit with the Odyssey project purposes, including: supporting the entire data mining process with a broad set of tools. The software can be customised to fully meet the project requirements of TAOC, including, information extraction and elaboration; enhancing accuracy of predictions and easily surfacing reliable information. Better performing models with new innovative algorithms enhance the stability and accuracy of predictions, which can be verified easily by visual model assessment and validation metrics. Predictive results and assessment statistics from models built with different approaches can be displayed side by side for easy comparison. The creation of diagrams serve as self-documenting templates that can be updated easily or applied to new contexts without starting over from scratch. Preparing, summarising and exploring data through a set of analytic tools allow the access to more than 50 file structures. Sampling and portioning TAOC data can be completed, with the capacity to use merging and appending tools segmenting profile plots, univariate/bivariate statistics/plots and interactively linking plots and tables. The data can also be transformed with the capability to prepare and analyse time series data, binning interactive variables, creating ad-hoc data driven rules/policies and replacing data. Structuring advanced descriptive models will help Odyssey users to contextualise the information gathered (clustering and self-organizing maps, basket analysis, sequence and web path analysis, variable clustering and selection, linear and logistic regression, decision trees, gradient boosting, neural networks, partial least squares regression, support vector machines, two-stage modelling, memory-based reasoning, model ensembles.
Ensuring the scalability of the system, SAS Enterprise MinerTM fully complies with the proposed Odyssey layer architecture and allows for scaling it up so to process a major amount of data in the future. The Odyssey knowledge extraction system will deal also with textual information: this will be accessed, extracted and transformed through specific SAS software, i.e. SAS Text MinerTM. The software automatically combines structured data and unstructured information and will allow the Odyssey objectives on TAOC information combining, comparing and correlating by: Clustering and categorising unstructured information. Any type of textual information will be grouped in “virtual dossiers” based on their content. A series of clustering techniques is made available, including spatial clustering, downstream clustering and hierarchical clustering.
3.3. Semantic Engineering and Data Mining
The application of Semantic and advanced Data Mining technologies will not only set new, simple, but effective standards for data management and knowledge discovery (Bonino, Corno, Farinetti & Bosca, 2004) (Bonino, Corno, Farinetti, & Inf, 2004) (Colucci et al.). In those databases, but also give a possibility for the strong collaboration inside EU police community. The main developments in the project will be focused upon advanced data mining and semantic knowledge extraction based on the notion of knowledge as described by (Akhgar & Siddiqi, 2001). The beneficial research of combined semantic and data mining provide a platform for capturing the information hidden in the data and produce applied knowledge. It should be noted that the requirements arising from large-scale data mining scenarios like those in Ballistics are extremely challenging and topics of interest include: Architectures for data mining in large scale environments; Semantics in the data mining process, identification of resources for data mining, such as data sources, data mining programs, storage and computing capacity to run large-scale mining jobs, provenance tracking mechanisms; Data privacy and security issues; Data types, formats, and standards for mining data; Approaches to mining inherently distributed data, i.e. data that for one reason or another cannot be physically integrated on a single computer; Data mining of truly large and high-dimensional data sets, e.g. data sets that do not fully fit into local memory; Adaptation of existing and development of new data mining algorithms.
The feasibility of data mining techniques applied to the investigation of serious crime represents a research agenda in the stage that should be seen as the first in a wider research endeavour. For this purpose, Ballistics will examine the benefits of the research taken by UCL Jill Dando Institute of Crime Science. They were assessing the feasibility of combining theoretical knowledge discovery applications to serious crime informed by the relevant academic literature. Furthermore, the latter will be further exploited by investigating usage of other technologies for crime prevention and detection such as SOCIS (scene-of-crime information systems). For example, knowledge acquisition and information extraction approaches were combined with a visual information system to store digital photographs taken at the scene-of-crime (SOC). The system capabilities included patterns and image recognition algorithms in order to provide more possibilities to data mining algorithms. The scenario of crime detection and prevention is an interesting scenario for exploring these problems. Each of the levels of knowledge extraction and mining depends crucially on maintaining and continually updating the inventory of concepts and terms, and the relationship amongst terms. The development of the Ballistic system will incorporate findings from this research in order to integrate heterogeneous crime scene information into common machine-executable representations – a task, in this context, akin to data preparation which is a crucial precursor to data mining.
The Odyssey knowledge extraction module used by the law enforcement agencies could warn the authorities that the weapon type and or bullet(s) were involved in similar situations. Odyssey would have quantified the risk and the possible outcomes. The system would have based its information on the mining performed by Data Mining module on the data Metadatabase coming from the local data repository and from other data sources within the policy driven data sources. The mining would have been supported by the application - Process Models to data (the Odyssey knowledge extraction module).
3.4. Odyssey Data Mining Module
Member States across the EU have been responsible for the collection of a vast number of items of information in the form of bullets, cartridge cases and test fires. Data Mining can be described as the process of extracting knowledge from data, with the main goal being to discover hidden and potentially useful information in the data. The process relies on a combination of abundantly available computing power, data storage, processing, transmission, constant flows of data between organizations and data warehouses and the human skills in interpreting the results. The purpose of the Data Mining component for the project is:
-
matching the data from the ballistic analysis to the situation by using additional data sources
-
matching two or more bullets as having been fired from the same weapon
-
matching two or more cartridge cases and
-
matching test fired bullets and cartridge cases with recovered samples.
Knowledge Extraction
The main goal of the module is hypothesis generation through the knowledge extraction. System is focusing on gaps in the crime scenarios and the presentation of possible semantically enriched propositions. It is done on several levels, i.e. on Stage Mining, Knowledge Extraction and Metadatabase. During the stage mining, the system is taking into consideration ballistic and crime scene databases provided by a particular organisation. Then the results of the data mining algorithms are applied separately to different data sources. These are merged in the second stage-knowledge extraction. New mining techniques are applied to create more solid view for the hypothetical scenario. Results of filling the gaps in the network of crime linkage enable the creation of a metadatabase. Such analyse will be based both on the data-mining techniques output, semantic enrichment and logically inferred hypothetical crime scenario’s knowledge base. The module will provide the statistically processed qualitative and quantitative information not only about the DM-derived bullet, cartridges and firearms and crime scene matches but it would also, using indirect association, generate hypothetical crime scenarios. After applying “reasoning” and “decision” activities of the ODYSSEY present ranked list of possible scenarios. Special attention will be given to the efficiency of the knowledge extraction module hence statistics (for data partitioning), ABML (for hypothesis space reduction) and qualitative modelling (for trends observation) will be used.
Conceptual model of Ballistic Semantic Structure
In this task only the semantics and the conceptual framework for the proposed standard will be handled, as a precondition for defining the standard itself. The proposed ballistic standard combines two critical aspects, i.e. technological specifications, and effective search and retrieval, comparison and knowledge mining requirements. Regarding the technological specifications, the standard must satisfy two critical requirements, i.e.
-
to be generic enough to cater for various technologies used in the ballistic analysis, including technologies used overseas (in order to enable a broader international cooperation), and
-
the standard must be extensible, in order to cater for the future technological innovations in the field of ballistic analysis.
This task is dedicated to the development of a system designed to cross-reference ballistic and crime data in multiple, distributed sources, in order to discover potential links and correlations between various parameters, as they will be defined in the Project. The data mining system must integrate a knowledgebase within a data mining system in such manner that this knowledge can be applied during data mining. It must be capable of utilizing advanced knowledge representation and generating many different types of knowledge from a given data source.
Emphasis will be put towards the development of a generic methodology and a system that implements it for utilizing prior knowledge in data mining in order to produce new knowledge that is understandable to the user, interpretable within the domain of application, useful in the light of a user’s expectations and intentions, and reusable in further knowledge discovery.
There is a need for automated data-mining and knowledge extraction using semantic capability to allow complex conclusions to be generated for fast and responsible decision making.
3.5. Use Cases
Figure 2. Odyssey platform
Odyssey platform enables users to input data for further analysis and the use of the platform does not require any technical or data mining knowledge. Moreover, the users are informed when the previously performed searches return new information. Finally, the platforms subsystem generates Red Flag alerts in events of e.g. high risk.
Figure 3. Odyssey platform data mining subsystem
Data mining process uses common steps to retrieve information, but also associates the results with the semantic structure that is further used for knowledge retrieval, search and visualization. One of the major achievements of the platform is the seamless integration of raw data processing and the use of semantics. It should be noted that the above 2 use cases are created based on collaborative requirements engineering process during Odyssey user group meeting. All the I/O information structures (input / output) are captured by direct input from user communities (e.g. Law enforcement agencies)
4. Conclusions
The Odyssey platform incorporates the use of advanced data mining techniques enriched with semantic technologies. It extracts information from various data sources and indicates how the information will be used next. Moreover, it creates a ontology-driven knowledge repository that enables the analysis of information in a more abstract way, which gives an advantage of being able to illustrate global tendencies or crime patterns. Also, the repository is used to operate and investigate real cases using logic reasoning and knowledge interference. Additionally, the platform is able to generate unified graphical results and clearly demonstrate the outcomes of complex analysis. Finally, the platform operates on a very specific domain, which enables the concentration of explicit problems, constantly evaluating outcomes, and suggesting the most promising solution.
The platform is set to fill a major gap in the cross-national investigation and security systems. National police forces will be able, once the platform will be running, to increase their investigation potential by accessing the refined data and graphically represented data patterns. Moreover, the Odyssey platform is structured as a framework which could be easily replicated for other forensic data sets as well as applied to different domains, thus re-defining the standards of information exploitation for large data sets.
5. References
Akhgar, B., & Siddiqui, J. (2001). A framework for the delivery of web-centric knowledge management applications. Internet Computing, 1, 47.
Artac, M., Jogan, M., Leonardis, A., & Bakstein, H. (2005). Panoramic volumes for robot localization. 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005.(IROS 2005), 2668-2674.
Bonino, D., Corno, F., Farinetti, L., & Bosca, A. (2004). Ontology driven semantic search. WSEAS Transaction on Information Science and Application, 1(6), 1597–1605.
Bonino, D., Corno, F., Farinetti, L., & e Inf, D. A. (2004). Domain specific searches using conceptual spectra. 16th IEEE International Conference on Tools with Artificial Intelligence, 2004. ICTAI 2004, 680-687.
Colucci, S., Di Noia, T., Di Sciascio, E., Donini, F. M., Ragone, A., & Trizio, M. A semantic-based search engine for professional knowledge.
de Bruin, J. S., Cocx, T. K., Kosters, W. A., Laros, J. F. J., & Kok, J. N. (2006). Data mining approaches to criminal career analysis. Proceedings of the Sixth International Conference on Data Mining, 171-177.
Gartner (2008), http://mediaproducts.gartner.com/reprints/sas/vol5/article3/article3.html
Hauck, R. V., Atabakhsb, H., Ongvasith, P., Gupta, H., & Chen, H. (2002). Using coplink to analyze criminal-justice data. Computer, 35(3), 30-37.
Jie, J., Wang, G., Qin, Y., & Chau, M. (2004). Crime data mining: A general framework and some examples. IEEE Computer, 37, 50-56.
Nath, S. V. (2006). Crime pattern detection using data mining. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 41-44.
Smith, M. K., Welty, C., & McGuinness, D. L. (2004). OWL web ontology language guide. W3C recommendation, 10 February 2004. World Wide Web Consortium.
Stanford University (2009). http://www.protege.stanford.edu.
Share with your friends: |