Because expert validation of anonymization approaches is really difficult, Aircloak started a bug-bounty program for its product – the Aircloak Attack Challenge. In the wake of this, a new method of measuring privacy was established and can be used for various functions in the future.
Nobody has a clue about Anonymization
It’s summer 2017 in Helsinki; some of the brightest minds in the privacy sphere come together to discuss the ownership and treatment of personal data at the myData conference. Paul Francis, scientific director at the Max Planck Institute for Software Systems (and, full disclosure, also a co-founder at Aircloak) asks the audience a question that has been on our minds for a long time: “Do you think a Data Protection Authority’s statement that “a system is anonymizing” can be trusted?”. The overwhelming answer from the audience: nope. Shaking heads all around.
Well, damn. That’s bad news, especially for a relatively new company such as Aircloak. We are convinced we’re providing one of the best and safest systems for analysing personal data – but how do we convince our customers?
If I had to name one legitimate question that is part of every sales process, it’s “Has someone certified Aircloak to be safe?”, to which I usually quickly ask the counter question “What kind of certificate would you like to see?”. Guess what: nobody knows. There just isn’t a commonly accepted seal of approval. This is a huge issue not only for us, but for the industry as a whole, especially with draconic fines in GDPR.
Privacy-enhancing technologies (PET) have gotten increasingly complex, so much so that it often takes true and rare experts in the anonymization and privacy space to reliably gauge their effectiveness. At the same time, the market has exploded, especially in the wake of GDPR – hundreds of technology and platform providers are popping up. A few years ago, you only needed little more than a firm grasp of privacy regulations to figure out a solution’s usefulness. Now, you won’t get anywhere without solving hard computer science and math problems.
There are political plans for a European certification program, but the path there is difficult. At least the overworked local authorities can make a statement for specific cases (and to be fair, even if you don’t think highly of their technical ability, you are still much better off at least having a statement). There are many commercial providers of certificates as well, and they range from comprehensive to complete scam. As one privacy consultant told us: “Anyone can make a certificate. I’ll come up with one right now if you want!”
Roses are Red, Violets are Blue; There’s Always Someone who’s Smarter than You
Turns out the solution has been out there all along; it just hasn’t been translated into the privacy domain yet. Anyone interested in software development or IT security has at least heard of bug bounty programs. One of the best known commercial platforms for providing such programs, HackerOne, has recently released a survey (some info here if you don’t want to leave your contact information) among 1,700 “white-hat hackers” who earn their money by finding security flaws or bugs (“vulnerabilities”) in software and handing in their findings to the software provider for cash. This isn’t only cool, but actually earns them a median salary of 2.7 times that of typical software engineers (and up to 16 times the median salary if you’re an Indian developer, for example).
These bug bounty programs are great: you get quick feedback on your existing weaknesses, there is a constant audit, and most importantly: you have international experts with diverse background and strengths looking at your design.
There is only one downside: if you let the world attack your product, you better be sure it’s a good one. Maybe that is why we haven’t yet seen any bug bounty program for anonymization approaches – as mentioned above, true anonymization is hard and therefore most methods used by other privacy solutions, such as data masking and pseudonymization, are only useful in combination with organisational measures.
So Aircloak and the Max Planck Institute for Software Systems started the world’s first anonymization “Attack Challenge” – the first round planned to last about half a year, until June 2018. We put our money where our mouth is and promised a payout of up to $5000 per successful attack on our anonymization layer.
Of course, the purpose of the Aircloak challenge is not to somehow “prove” that Aircloak’s approach is secure. Lack of a demonstrated attack obviously does not mean that an attack doesn’t exist. Rather the Aircloak challenge is a good-faith, best-effort attempt at finding and fixing unknown weaknesses in the technology.
And it has been a great ride.
We had smart and knowledgeable attackers from all around the world, and our specifically provided Aircloak instance was bombarded with hundred thousands of attack queries per day. By the end of May 2018, 33 million such queries had been launched. In the end, two attacks were successful – both of them exploiting related holes in Aircloak’s system. One came from a group of researchers from MIT and Georgetown University, the other from University College London (UCL) and École polytechnique fédérale de Lausanne (EPFL). You can read up more detailed stats on the challenge on our website.
Hall of Fame
Aloni Cohen – MIT
Kobbi Nissim – Georgetown University
Apostolos Pyrgelis – UCL
Carmela Troncoso – EPFL
Emiliano De Cristofaro – UCL
|$5000||0||100%||Aloni Cohen / Kobbi Nissim||MIT / Georgetown Univ.|
|$5000||0||95%||Apostolos Pyrgelis / Carmela Troncoso / Emiliano De Cristofaro||UCL / EPFL|
Aircloak Attack Challenge Hall of Fame
The amount of the bounty reward is defined by the α,κ score
(effectiveness α and confidence improvement κ)
Both the successful attacking teams were paid out, and the holes were a quick fix for us. It cost us $10k, which isn’t peanuts for a startup, but we quickly realized: this money is well invested. We’ve fixed weaknesses in the system – better on our test servers than on customer premises – and we can be more confident that we’re truly protecting data subjects’ privacy. But more than that: Our customers can be more confident.
Our Aircloak Attack Challenge Mascot
Another big advantage of bounty programs is that they encourage ethical hacking. Attackers know that the company is committed to patching vulnerabilities, and so are inclined to work with the company rather than unilaterally announce vulnerabilities to the public. Unfortunately, at the same time we were running the challenge, a group around Yves-Alexandre de Montjoye published an attack on Diffix (Aircloak’s core technology) that they didn’t run through our attack challenge – it was actually never more than an attack sketch.
New research: Exploiting @aircloak‘s Diffix anonymization mechanism through a noise-exploitation attack. Blogpost: https://t.co/jBaWXlW53H Paper: https://t.co/yDvDStewkS (#infosec during #EuroSP18) pic.twitter.com/vWw9LorQOv
— Yves-A. de Montjoye (@yvesalexandre) 24. April 2018
This blindsided us a little, and we found it questionable that they decided to go with an immediate full disclosure instead of a controlled or cooperated disclosure. After all, the system was already used in production and we immediatly published our rebuttal. One reason for this decision may have been Yves’ position on the advisory board of one of our competitors. On the bright side, the attack was very theoretical and never posed a threat to active installations. And in the end, anyone who is showing us ways to improve Aircloak Insights is a gift!
Nonetheless, a major point that Yves and his team raise in their attack paper is that a system like Aircloak should be open to inspection – and ours is. Transparency and being open to contributions from the community, as mentioned by the team from Imperial College London, is something we have always considered crucial for the further development of Diffix. Apparently we had not communicated that well enough.
Not all Attacks are Created Equal
Before we could run our bounty program, we had to devise a way to measure the effectiveness of attacks so that we could pay more for better attacks. Our method took into account
– how much the attacker could learn about individuals
– how accurate what they learned was, and
– how much auxiliary information was needed.
A perfect attack would expose the entire database without any prior knowledge of the data. A very weak attack would expose one attribute about an individual, if you have knowledge of all other data points in the data set.
We quickly realized, however, that our measure could be generalized to any anonymization system, and therefore could be used by certification programs and interested buyers.
Quickly, at the Max Planck Institute for Software Systems, the idea of a “GDA Score” was born: The General Data Anonymity Score.
The score is based on the three criteria for “real” anonymization as set forth by the Working Party 29 in their Opinion on Anonymization Techniques from 2014 (singling out, linkability and inference). Although older, this paper is still a great guide to what anonymization can and cannot do.
The score also takes into account the type of data being attacked. If we use the score at Aircloak, it will allow us to gauge the effectiveness of a potential attack in very concrete cases. Customer A’s database may not be susceptible to a specific attack, because their data doesn’t have the right properties. The score will be low (or zero) for that database. But for customer B it may be high – so a quick reaction is vital. This is fantastic for risk mitigation.
But not only does it allow comparison of different customer cases against each other, but also comparison of different anonymization approaches.
Earlier in the year the Max Planck Institute presented this score to CNIL, which is in the process of designing a certification program for anonymization. They were very enthusiastic about the GDA score, and are following its progress closely.
What will the Future Bring?
Firstly, our attack challenge will most certainly go into a second round. We’re still playing with the setup – it may be launched in collaboration with the Max Planck Society this time, for example. My personal vision is that eventually it’s simply going to be a continuously running program.
Aircloak’s approach and core algorithm will remain open to inspection. We’re convinced this is the best way to make sure the system stays safe – by inviting smart people from around the world to look at it (and break it, if they manage). While this isn’t true “open source” for the moment, we feel it strikes a good balance that allows us to finance further development while being open for public scrutiny.
For Aircloak Insights specifically, I’m hoping it will become an integral part of modern analytics stacks – the “missing layer”, as Tristan Handy coined my new favourite term.
And the GDA score? For us it will remain a vital tool in threat evaluation and risk mitigation. But more than that, it could become a vital tool for authorities to distinguish approaches and for customers to choose the one that’s best for their needs.
After all, we’ll all be better off with the best anonymization available.