19FIWARE OpenSpecification Security Optional_Security_Enablers DBAnonymizer
Name
|
FIWARE.OpenSpecification.Security.Optional Security Enablers.DBAnonymizer
|
Chapter
|
Security,
|
|
|
Catalogue-Link to Implementation
|
DB Anonymizer
|
Owner
|
SAP, Francesco Di Cerbo
|
19.1Preface
Within this document you find a self-contained open specification of a FI-WARE generic enabler, please consult as well the FI-WARE_Product_Vision, the website on http://www.fi-ware.eu and similar pages in order to understand the complete context of the FI-WARE project.
19.2Copyright 19.3Legal Notice
Please check the following FI-WARE Open Specification Legal Notice (essential patents license) to understand the rights to use this open specification. As all other FI-WARE members, SAP has chosen one of the two FI-WARE license schemes for open specifications.
To illustrate this open specification license from our SAP perspective:
-
SAP provides the specifications of this Generic Enabler available under IPR rules that allow for a exploitation and sustainable usage both in Open Source as well as proprietary, closed source products to maximize adoption.
-
This Open Specification is exploitable for proprietary 3rd party products and is exploitable for open source 3rd party products, including open source licenses that require patent pledges.
-
If the owner (SAP) of this GE spec holds a patent that is essential to create a conforming implementation of the GE spec (i.e. it is impossible to write a conforming implementation without violating the patent) then a license to that patent is deemed granted to the implementation.
19.4Overview
Large organizations held thousands of terabytes of datasets about their customers or their activities. They often have to release data files containing private information to third parties for data analysis, application testing or support. To preserve individuals’ privacy and comply with privacy regulations, part of released datasets have to be hidden or anonymized using various anonymization techniques.
However, two different problems may arise: first to decide if a piece of data has to be considered private or not, and second, to assess whether the exposure of non-private data could be used by correlation algorithms to infer hidden private data. The second task is particularly challenging, and cannot be handled manually for large datasets, where the potential number of combinations of different fields is extremely large. In fact, disclosure policies are typically described by human users (security experts and others) that are not able to predict all the possible combinations of the data that could ease the guess of private data contained in the dataset. In some other cases, policy authors are not necessarily security experts and could expose sensitive data without being aware of the impact of such exposure.
DB Anonymizer is a database re-identification risk evaluation and anonymization service; it can be used as a support tool in case of dataset disclosure operations. DB Anonymizer deals with the estimation of the re-identification risk associated to information disclosures, which is the risk that an attacker can reconstruct exactly a dataset's content. This estimation is then used for providing DB Anonymizer users with a number of functionalities connected to dataset anonymization. For instance, the service exposes a function that calculates a value, that represents the likelihood (from 0 - impossibility to 1 - certainty) that an attacker can reconstruct exactly a dataset's content that is anonymized using a certain obfuscation policy.
Albeit privacy risk estimators have already been developed in some specific contexts (statistical databases), they have had limited impact, since they are often too specific for a given context, and do not provide the user with the necessary feedback to mitigate the risk. In addition, they can be computationally expensive on large datasets. DB Anonymizer is specifically designed to address all these issues, exposing a simple RESTful API that can be easily integrated in any application.
DB Anonymizer uses a special algorithm to estimate the re-identification risk. Details on this algorithm can be found in the following article:
-
Trabelsi, S.; Salzgeber, V.; Bezzi, M.; Montagnon, G.; , "Data disclosure risk evaluation," 2009 Fourth International Conference on Risks and Security of Internet and Systems (CRiSIS), pp.35-72, 19-22 Oct. 2009. DOI: 10.1109/CRISIS.2009.5411979
Target usage
The service can be used by information owners or responsible persons to evaluate the re-identification risk associated to an information disclosure operation of their data; by suggesting the safest configurations according to a specified upper-bound and finally to perform the dataset anonymization operation according to a disclosure policy. More precisely, through the methods of its API, the service provides the user with an estimation of the re-identification risk when disclosing certain information, and proposes safe combinations in order to minimize the risk that an attacker can reconstruct the original dataset. For instance, the service can estimate the re-identification risk associated to all attributes of a dataset (i.e., its columns); this functionality helps the users in defining the anonymization policies that better suit their business needs and that minimize the re-identification risk.
DB Anonymizer at this stage supports DB dumps in basic SQL syntax. It is however recommended to use MySQL SQL instructions.
Share with your friends: |