Diffix Vulnerability #1

Back to Overview


April 2018


April 2018


October 2018



Patched Version


Patched Date

July 2018

This attack was discovered by Aloni Cohen (MIT) and Kobbi Nissim (Georgetown University) as part of the first Aircloak Challenge (Dec 2017 – May 2018). The attack is described in https://arxiv.org/abs/1810.05692. The attack was successfully demonstrated on the Aircloak challenge system.

The attack is a singling out attack, where the attacker makes a claim of the form “There is a single user with the following attributes.” In this demonstrated attack, the attributes were columns ‘client_id’ and ‘status’ from the banking database table ‘loans’. ‘status’ is a column with four possible values, ‘A’, ‘B’, ‘C’, and ‘D’. An example claim of the demonstrated attack would be “There is a single user with client_id 372 and status ‘B’.”

The attack required that there is a column that uniquely identifies users, and that the identifiers be numbers. If the identifying numbers are relatively densely assigned (i.e. a number sequence like 1,2,3,…), the attacker requires no prior knowledge. If the identifying numbers are sparsely assigned the attacker must know the identifier numbers as prior knowledge. In the case of the challenge demonstration, the table UID (column name ‘client_id’) was used, and was densely assigned (roughly 6% populated). Therefore the attacker required no external knowledge.

The demonstrated attack consisted of 3500 queries of the following form:

SELECT count(clientId) FROM loans
WHERE floor(100 * ((clientId * 2)^0.7) + 0.5)
= floor(100 * ((clientId * 2)^0.7))
AND client_id BETWEEN 2000 and 3000
AND status = ā€˜Cā€™

Each of the queries used a different combination of the constants in the ‘floor’ statement (2, 0.7, 100, and 0.5). As a result, each query matches a different combination of users with client_id in the range 2000 to 3000. In this case, there were 73 such users, and so the demonstrated attack is designed to determine whether the status of those users is ‘C’ or not. The attack may be repeated to determine which users have other status values.

The queries and answers are composed as a set of equations which are solved using a constraint solver.

All claims were correct.

The attack may be used to discover whether identified users have or do not have any given value in any given column. The attack must be repeated for each specific value learned. Attacking more users at a time requires more queries and more subsequent computation. This specific attack does not work with text identifier columns. It is not currently known if there is an analogous attack for text identifier columns.

The fix deployed by Aircloak has two parts. The first part is to determine which columns may potentially be used in an attack. These are columns where a substantial proportion of values uniquely identify a user (at least 80% of values), and are called “isolating”. The second part is to eliminate the use of math on isolating columns. An attack on a column with 80% identifying values results in 72% accuracy.