19.5.1Relevant Concepts and Ideas
To operate, DB Anonymizer needs as input from users:
-
a dump of a MySQL table, containing the full dataset to disclose, together with
-
a disclosure policy (also known as obfuscation policy).
Both inputs are mandatory to let the service's algorithm to be able to evaluate the effectiveness of the disclosure policy, and for any other supported operations. In fact, the disclosure policy defines the structure of the dataset, and in particular, the sensitivity of each dataset element type (i.e., each column of the input table). Once the policy is evaluated, the table is dropped from the DB and the file dump is erased. The application server encapsulation model permits a complete isolation of each request data, and any intermediate result created during the algorithm's execution associated to real dataset contents is deleted immediately at the end of the computation.
19.5.2Input Format
Generally, DB Anonymizer functions have two parameters:
-
an SQL table dump (e.g., using MySQL dialect), containing all information to be disclosed: this file shall contain only a table definition and a set of elements to populate it;
-
a policy file encoded in XML, that describes which information of the previously specified table is going to be disclosed: the policy file is described by the following XML Schema directives:
A sample policy file is the following:
Gender
identifier
false
Wine
sensitive
false
The “Type” information shall be “identifier” or “sensitive”, in order to allow the service algorithm to distinguish them. Please refer to the Glossary (at this link: FIWARE.Glossary.Security.Optional Security Enablers.DBAnonymizer) for an explanation of the two terms.
19.5.3Use Case
A company holds information about people structured in dataset records. Each record has many attributes, such as birthday, address, marital status and occupation that are useful for company's purposes, but usually are not sensitive, if considered in isolation. Other attributes related to the connection between an individual and the company, such as customer purchases, debts, and credit rating, may be sensitive. Suppose that one of such dataset has to be released with a third party: it has to be modified, in order to protect the privacy of subjects described in the dataset, according to privacy protection regulations. Therefore, certain elements will be omitted, like for instance obvious identifiers such as social security number, name and address; other attributes such as occupation and marital status can be left intact, and other key and sensitive attributes modified to preserve confidentiality. For example, salaries might be truncated, ages grouped more coarsely, and zip codes swapped on pairs of records. Furthermore, some attributes on some records might be missing or intentionally removed. However, if this anonymization process is not carefully designed, it could be possible for attackers to use techniques to reconstruct the original dataset, as a whole or in parts, also by cross-comparing it with other datasets (e.g., a similar dataset of a competitor). The DB Anonymizer allows evaluating an anonymization policy, in order to measure its robustness to dataset reconstruction techniques.
Let us consider the following example.
Use Case for DB Anonymizer
-
The IT-Security Expert, on behalf of the Dataset Owner, creates the Disclosure Policy.
-
The IT-Security Expert, on behalf of the Dataset Owner, creates the DB Dump.
-
DB Dump and Disclosure Policy are sent to DB Anonymizer using the evaluate policy.
-
The DB Anonymizer sends back the Result Identifier (GID).
-
The Dataset Owner asks for the evaluation result to DB Anonymizer, using the GID.
-
The DB Anonymizer sends back the evaluation result.
-
The Dataset Owner modifies the DB data, according to the accepted policy.
-
The modified DB dump is sent to the Consulting Company.
19.6Main Interactions 19.6.1DB Anonymizer Architecture
FMC Block Diagram of DB Anonymizer: User System (on the left) and DB Anonymizer service (on the right side)
The previous block diagram shows the different elements that compose the DB Anonymizer service. Starting from the DB Anonymizer block (on the right side of the diagram), the core of DB Anonymizer is the Anonymization Algorithm[1] component, which interact closely with an internal MySQL database. The Anonymization Algorithm interacts with users through a ReSTful interface. More precisely, the RESTful interface component is responsible for invoking the Anonymization Algorithm operations, and providing them with user inputs.
In the left part of the block diagram, a user is depicted together with a RESTful client component, for interacting with DB Anonymizer RESTful interface. The RESTful client can also be implemented by a traditional web browser.
UML Use Case Diagram of main DB Anonymizer functionalities
The previous use case diagram represents the main functionalities of DB Anonymizer. They can be used for analysing and reviewing a dataset's disclosure policies and finally to perform the anonymization operation on a dataset. More details on each functionality can be found in the FIWARE.OpenSpecification.Security.DBAnonymizer.Open_RESTful_API_Specification page.
UML Sequence Diagram of two main DB Anonymizer operations: evaluatePolicy and getPolicyResult
The previous sequence diagram shows the order with which the main DB Anonymizer operations should be invoked; the entities depicted are the same as for the previous block diagram.
The DB Anonymizer API encloses a number of methods; the core functionalities have to be invoked by users in the following order:
-
evaluate;
-
getResult.
Example:
-
evaluatePolicy;
-
getPolicyResult;
The first method allows for starting the analysis of an anonymization policy together with the associated dataset. The RESTful interface component exposes this method, and any incoming request get routed and served by the Anonymization Algorigthm component, that creates a new computing process. The Anonymization Algorithm component returns immediately a request identifier (GID) to the ReSTful component and thus to the user, which can be used to retrieve the analysis result. Each computation process performs its analysis on the received policy and dataset, and then writes a result to the DB. At that point, the process terminates, deleting any used data. The second method can be invoked by users to retrieve the result of a computation, identified by a GID. The result of getPolicyResult is either the analysis result when available (from 0 - impossibility to 1 - certainty), or an error code (result is not ready, error in receiving parameters and so on; please refer to the RESTful API documentation for a detailed error code list and explanation).
Other DB Anonymizer operations follow the same structure. Please refer to FIWARE.OpenSpecification.Security.DBAnonymizer.Open RESTful API Specification for a complete list of supported operations.
Share with your friends: |