Diffix Vulnerability #5

Back to Overview


May 2020


May 2020


October 2020



Patched Version

20.2 (Diffix Dogwood)

Patched Date

August 2020

This attack was discovered by Matthew Joseph, Zachary Schutzman, and Travis Dick, of the University of Pennsylvania. For more details, please refer to the attack authors’ description, which focuses on theoretical aspects of the attack, and to the Aircloak description, which focuses more on practical aspects. The attack was demonstrated to Aircloak in May 2020. The attack was patched in version 20.2 (Diffix Dogwood), and released in August 2020. The attack was announced in October 2020.

The attack is a singling out attack, whereby the attacker makes a claim of the sort “There is a single user with the following attributes.”

The attacker must have complete prior knowledge of part of one column, and for all practical purposes all of a second column. The second column must be numeric, and between roughly 50% and 80% of the values in the second column must be unique to a single individual.

The attack is a variation of the linear reconstruction attack. The attack used to demonstrate effectiveness consisted of roughly 1000 queries of the following form:

SELECT vendor_id, count(*)
FROM rides
WHERE floor(pickup_latitude ^ 8.789 + 0.5) = 
          floor(pickup_latitude ^ 8.789)
      AND trip_distance IN (0.87, 1.97, 2.75)

whereby the exponent of pickup_latitude column is modified in each query. This results in selecting a random set of users from those with the three specified trip_distance values. The resulting answers are then formatted as a set of equations that may be solved for the vendor_id of all of the users with the given trip_distance values. The attack can be used to determine one of two possible values of vendor_id. More details of the attack may be found in the appendix of this post or the authors’ description.

The attack used to obtain the best effectiveness score is 100% effective (on the earlier unpatched version). The attack does not work on the patched version as of August 2020.

Of all possible combinations of columns needed to run the attack:

  • 3% of the combinations produced a 95% or better confidence improvement
  • 4% of the combinations produced a  50% confidence improvement
  • 2% of the combinations produced a 15% confidence improvement
  • 15% of the combinations produced a 6% confidence improvement
  • The remaining 76% of combinations were not effective or could not be run

The fix implemented by Aircloak is to disallow any arithmetic computations on columns where a bucketing function is used (floor(), ceil(), round(), mask to int, bucket(), etc.)