Data Analysis and Compliance under GDPR

No one can doubt the importance of data protection. Data breaches proved to have serious consequences for companies and rarely a day passes without news about another major data leak.

According to the UK Information Commissioner’s Office (ICO), the number of incidents has quadrupled since the strict GDPR regulations took effect in May 2018. The fines in Europe can now reach up to 20 Million € or 4% of a company’s global revenue (whichever is higher).

Apart from other factors like the reputational damage, these fines are strong arguments to focus your efforts in IT- and Data Security. The bad news is that 100% security is almost impossible. There are far too many variables to consider, be it the carelessness of a co-worker, the discovery of a software vulnerablity or the general technological progress which opens new attack vectors.

However, you still have a responsibility to use best-practice IT security measures to comply with GDPR. You have the responsibility to use good IT security measures which meet compliance in data protection, corresponding to the local law – but what does this mean exactly?

Data anonymization must play a key role for any data protection officer responsible for a company’s data protection strategy.

Below we explain some more details about GDPR and show how modern data analysis should be handled. Additionally, we show how anonymization can preserve the privacy of your users and ensure full GDPR compliance.

FAQ to Data Analysis, Compliance and GDPR

The European General Data Protection Regulation (GDPR) describes a number of different approaches to protect the data and privacy of individuals. The following sections from the regulation explain a bit more about GDPR and describe the most relevant anonymization measures to take when handling sensitive data.

How does GDPR define personal data?

The GDPR covers any data that can be used to identify a person. In Article 4 it states:

“An identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”

The complexity here lies in the fact that it covers data that might indirectly identify a person. Often a single piece of data on its own will not be sufficient to identify a person (for instance knowing someone is called John Smith). However, if you also know that John Smith lives in Edinburgh and works for Starbucks then you may have narrowed it down enough to identify him. Even harder is the fact that sometimes an Identifier may be personal and other times not. So the name John Smith is common enough to not usually be personal (at least in some countries), but the name Jeffrey Sampson is more rare.

How does GDPR define consent?

GDPR Article 7 requires that individuals give informed consent before their personal data can be used. This means that not only must they give the consent, they must understand precisely what they are consenting to and must take some explicit action to indicate this. Importantly, this consent can also be withdrawn at any time.

What penalties can GDPR impose?

GDPR carries some of the toughest penalties in the world for any company that breaches the rules. The potential fines are 20 million Euros or 4% of annual global turnover, whichever is higher. The EU will also have the power to ban non-compliant organisations from trading with any nation that has adopted the GDPR into national law.

What other rights does GDPR give to individuals?

A key plank of GDPR is that it grants individuals certain additional rights relating to their personal data. These include:

– The right to be forgotten (meaning they can ask for all their data to be deleted)
– The right to request a copy of all their data (for free for the first copy)
– The right at any time to withdraw consent for their data to be used

What is the scope of the GDPR?

Unlike many national data protection laws the GDPR applies to any company that deals with data of EU residents. It applies irrespective of where in the world the data processing and collection takes place and also irrespective of where in the world the company is located.

General principles relating to processing of personal data

The GDPR defines 7 key principles which are based on the principles of the Data Protection Act from 1998 as follows:

Lawfulness, fairness and transparency
Organisations have to make sure that data collection and handling is within the scope of the law and that all data is used fairly. This means you must not process the data in a way that is unduly detrimental, unexpected or misleading to the individuals concerned.
Purpose limitation
Before data processing it has to be clear for what purpose(s) the data is being collected and handled. The purposes have to be documented and specified in your privacy policy.
Organisations can only use data for a new purpose if it is compatible with the original purpose, you receive an updated consent, or you have a clear basis in law.
Data minimisation
Organisations have to make sure that the personal data they are processing is adequate (“collect only the data you need for the purpose”), relevant (“the data is directly linked to the purpose”) and limited (“the data is only needed for the purpose and deleted when it’s not needed anymore”).
Accuracy
Data has to be accurate. Organisations should have processes to check the accuracy and make sure that data is not misleading or incorrect.
Storage limitation
Organisations are not allowed to store personal data longer than it is needed. That requires that they know what kind of personal data they have and where it is stored. There should be policies and processes in place which periodically review the data being held and erase or anonymize it when it is no longer needed. Individuals can request the deletion of their data at any time and organisations need appropriate processes to comply with such requests in a reasonable time.
Integrity and confidentiality (security)
Organisations need to have “appropriate technical and organisational measures” to protect the data they hold. Doing this requires to considering things like risk analysis, organisational policies, and physical or technical measures.
Accountability
Organisations shall be responsible for, and be able to demonstrate compliance. That means that all measures must ensure the “confidentiality, integrity and availability” of the systems, services and the personal data which are processed within them.

Data Protecion by Default and by Design

GDPR Article 25 Data protection by design and by default states that organisations have to implement “appropriate technical and organisational measures […] in an effective manner and to integrate the necessary safeguards into the processing in order to meet the requirements of this Regulation and protect the rights of data subjects. Pseudonymisation is given as an example of such a measure. Done correctly, pseudonymisation can be used as a means of minimising the personal data being kept about an individual.

Pseudonymisation is the process of replacing a uniquely identifying attribute with another uniquely identifying attribute that carries less semantic meaning. For example, a social security or a bank account number might be replaced with a seemingly random value. However, a pseudonymised data record might still allow the re-identification of an individual, given additional knowledge. For example, knowing the time when an individual was at the doctor’s office might allow an attacker to re-identify a patient in a pseudonymised medical database.

That means that pseudonymised data still allows the re-identification of a natural person and should therefore be considered and handled as sensitive data.

The GDPR Recital 26 makes this clear: “The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person.”

Pseudonymised data falls under GDPR, anonymized data does not.

GDPR Recital 26 distinguishes explicitly between pseudonymised and anonymized data. It explains that pseudonymised data “should be considered to be information on an identifiable natural person.” It then states: “The principles of data protection should […] not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.” It makes it clear: “This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.”

In short: Data protection does not apply to anonymous information.

Does anonymization count as data processing?

In “Opinion 05/2014 on Anonymization Techniques”, the European Article 29 Data Protection Working Party describes and compares different approaches to data anonymization.

The opinion elaborates on the robustness of each technique based on three criteria:

1. is it still possible to single out an individual
2. is it still possible to link records relating to an individual
3. can information be inferred concerning an individual

They state that: “Accordingly, the Working Party considers that anonymization as an instance of further processing of personal data can be considered to be compatible with the original purposes of the processing but only on condition the anonymization process is such as to reliably produce anonymized information in the sense described in this paper.”

So, according to WP29, the anonymization of data does not itself require the user’s consent if the data collection itself was justifiable.

Who certifies anonymization tools and solutions?

Unfortunately, as yet there are no official certifications for tools and methods for anonymization. The great challenge and difficulty in the evaluation of anonymization is that no known method can guarantee 100% data protection. If data is made anonymous, its information content is inevitably reduced and distorted. To ensure that raw data retain their significance in an analysis, the data can only be changed to a limited extent. You can read more about this in our article “Data Compliance in the GDPR – How anonymization allows you to stay compliant in your data analysis”.

To demonstrate the robustness of Aircloak Insights we launched the Aircloak Attack Challenge – the world’s first bug bounty program specifically for an anonymization solution.

In June 2018 we finished the first round of the challenge with great results. You can read about the process and the outcomes in our article “Break my Life’s Work and I’ll Pay You Handsomely”.