Data Compliance in the GDPR – How anonymization allows you to stay compliant in your data analysis

July 23, 2018 by Nicolas Sartor

Data protection is becoming increasingly relevant in the public debate. Numerous examples have recently shown that companies are confronted with serious consequences in the event of data breaches. Ensuring proper data protection is therefore more important than ever. In order to strengthen the rights of the user and to protect privacy, we must generally change how we handle data. The European General Data Protection Regulation (GDPR), which came into effect throughout the EU on May 25th 2018, is a major step forward. User rights have been considerably strengthened and the level of data protection has been harmonized to a large extent across European countries. While some companies may perceive the regulation as too strict and innovation-inhibiting compared to other jurisdictions, we believe that more responsible handling of data can become a decisive success factor and competitive advantage. End users who entrust their personal data to a European company can be sure that the most modern data protection regulations in the world are applied.

“The protection of personal data is a fundamental right for all Europeans, but citizens do not always feel in full control of their personal data. […] The reform will accomplish this while making life easier and less costly for businesses. A strong, clear and uniform legal framework at EU level will help to unleash the potential of the Digital Single Market and foster economic growth, innovation and job creation.”.”

– Viviane Reding, former EU Commission’s Vice-President in January 2012

Some companies like Microsoft understood this competitive advantage and issued a statement before the new directive came into effect. The GDPR will now apply to all Microsoft customers worldwide, not only in Europe.

Data protection and data use – how is this compatible?

Data protection is a critical issue especially in the context of the analysis of sensitive information such as personal data. Big data and the development of new analytical concepts which are for instance used in the calculation of risk assessments when granting loans, enable new perspectives on information. Supposedly harmless data analyses can quickly lead to misuse if there is no legal or contractual consent for processing for the intended purpose. In the past there have been many ambiguities in the analysis of sensitive data sets, but the GDPR offers advise in the recommendation of technological approaches that can guarantee compliant processing of data.

Article 32 “Security of processing” states that “taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk.” Technical and organisational measures which are mentioned are pseudonymization and encryption of personal data and systems which ensure the ongoing confidentiality, integrity and the ability to restore data in the event of a physical or technical incident. When it comes to analyzing sensible datasets, measures which are mentioned more in detail are pseudonymization and anonymization.

Pseudonymization vs. Anonymization

GDPR Article 25 “Data protection by design and by default” states that pseudonymization can help to implement the data protection principle of “data minimisation” and thus protect the data of the people involved. However, a pseudonymised data record still allows the identification of individual persons if one has access to other data sources that make this conclusion possible.

Pseudonymization involves replacing the data in personally identifying fields with a seemingly random number or text. For instance, fields like name, address, credit card number, and so on are all replaced with a single random value.

Simply replacing the data in these fields however does not make it impossible to re-identify individuals in a pseudonymized data set. With a little additional knowledge about the user in the data set — for instance knowledge of the dates of several doctor visits for a medical data set — it can be easy to re-identify pseudonymized users.

The GDPR Recital 26 therefore states: “The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. […] The principles of data protection should therefore not apply to anonymous information, including for statistical or research purposes.”

In its detailed documentation “Opinion 05/2014 on Anonymization Techniques”, the European Article 29 Data Protection Working Party has gone more into detail on how anonymization works in the context of the GDPR: “Accordingly, the Working Party considers that anonymization as an instance of further processing of personal data can be considered to be compatible with the original purposes of the processing but only on condition the anonymization process is such as to reliably produce anonymized information in the sense described in this paper.” That means that the anonymization of data does not require the user’s consent if there was a justified reason for collecting the data beforehand.

Different approaches to anonymization – Which one is the best for my project?

In contrast to pseudonymized data, anonymized data are no longer covered by the GDPR and are not subject to any further restrictions by data protection laws. But how can data be analysed and anonymized to ensure data protection compliance? The paper of the Article 29 Data Protection Group contains a detailed evaluation of anonymization techniques, in which a number of anonymization methods as well as their respective strengths and weaknesses are explained in more detail.

Depending on the use case, different ways of anonymization can be considered. At the moment, many business intelligence tools still use methods such as data masking and pseudonymization. In these kind of processes, analysts must take into account any additional knowledge that a potential attacker might have that could lead to the re-identification of sensitive areas. With these and other approaches such as K-anonymity, L-diversity or t-closeness, the analyst must also decide in advance which fields contain sensitive data. These must either be removed or modified before an analysis can be performed, which further reduces the quality of the data set. If values within the dataset are modified, the anonymization has to be done again.

Win-Win Situation: Gaining deep insights while maintaining full GDPR-Compliance

One of the biggest advantages of the dynamic anonymization method Differential Privacy is that the degree of anonymization can be mathematically proven. It is currently already being used by the tech giants Apple and Google to analyze user usage habits anonymously. In order to avoid the “privacy budget” and enable continuous dynamic monitoring of users, however, Apple and Google place strict limits on what can be analyzed and add substantial noise.

Aircloak Insights uses some principles similar to Differential Privacy, but adds new techniques like sticky layered noise that eliminate the need for a privacy budget, allow for rich analytics over a wide variety of use cases, add minimal noise, and place no limit on the number of queries. You can read here more about how Aircloak Insights anonymization works in detail.

Certified data protection for data analysis

There are as yet no certification for tools and methods for anonymization in data analysis, similar for instance to the German “Trusted Cloud Label” for cloud solutions. It is difficult to estimate when a general certification can be expected – in the case of the Trusted Cloud, it took about 6 years from the federal funding program to the finished platform before users and service providers could use it.

The great challenge and difficulty in the evaluation of anonymization is that no known method can guarantee 100% data protection. If data is made anonymous, its information content is inevitably reduced and distorted. To ensure that raw data retain their significance in an analysis, the data can only be changed to a certain extent.

The conflict between usability and security in data anonymization means that so far no European data protection authority has extensively evaluated or even certified technologies and methods for data anonymization outside specific use cases. Up to date only the Laboratoire d’Innovation Numérique de la CNIL (LINC), a working group of the French data protection authority, is investigating the possibilities of measuring values of anonymized data records and quantifying the loss of information using interactive data visualisation.

The Max Planck Institute for Software Systems in Germany is also currently working on an evaluation system with which anonymization with regard to data security and quality will be measured in the future. To reliably evaluate the performance of Aircloak Insights, Aircloak has launched a bounty program, the so called Aircloak Attack Challenge. It is the first and only bug bounty program for an anonymization solution worldwide. The first phase ran from December 2017 to May 2018 and we are looking to launch another phase later this year.

The total prize pool was $15,000. A maximum of $5,000 could be won per attack
A total of 33 attack accounts have been created, including researchers from renowned universities and institutions such as EPFL, Georgetown Univ, INRIA, MIT, University College London and many others
In total, more than 30 million attack queries were performed on the system

Only two attack teams formulated successful attacks. They were rewarded with $5,000 each for their achievements and were inducted into the Aircloak Attack Challenge Hall of Fame. Fixes for both attacks have been implemented. Many thanks to all participants who help to make Aircloak Insights even safer!

If you would like to know more about Aircloak we are happy to show you details in our demo!

Categorised in: Aircloak Attack Challenge, Anonymization, GDPR, General, Privacy

Anonymisation Compliance Data GDPR Privacy

Previous Next