There is a wide range of data anonymization tools available.
The selection should be carefully made according to the purpose for which you want to anonymize your data.
The most important questions you have to ask yourself is:
- What programming environment are you using? Does it have to work in a specific one like R?
- What anonymization methods do you need? Are there any external requirements?
- How accessible and easy to use should the solution be?
There is a big difference between a student who conducts a survey with test persons for a thesis and anonymizes the results once, or a data scientist in a bank who regularly analyses customer transaction data.
Getting anonymization right is very complex. In the most professional use cases, free-to-use software isn’t efficient because the anonymization has to be heavily tailored on the use case. That is why we built Aircloak Insights – to replace the slow and error-prone manual processes with our automated solution.
Open Source Anonymization Software
For a one-time anonymization, for example of survey data, static anonymization is often sufficient. In the list below you can find some open source anonymization tools. We paid special attention to actuality, so that the software is still supported and updated.
ARX Data Anonymization Tool
The ARX Data Anonymization Tool is a relatively popular open source and cross-platform tool.
It supports different privacy models like k-anonymity (or its variants l-diversity, t-closeness, b-likeness) or Differential Privacy and can be used for up to 50 dimensions (e.g. attributes) and millions of records. It also has a comprehensive graphical user interface.
The latest release is from March 2018.
Amnesia is a data anonymsation tool that has its background at the Athena Research Center. It supports k-anonymity and km-anonymity. Amnesia has an hierarchy creator and editor that allows the user to tailor the anonymization to find the right balance between privacy and data utility. The installer can also be downloaded on the website.
If you just want to give it a quick spin, there is also an online-version without the need to install anything.
The newest version is from December 2018.
μ-ARGUS is tool designed to create safe micro-data files and is based on the programming language R, which is specifically built to support statistical analyses.
ARGUS stands for ‘Anti Re-identification General Utility System’. The tool uses a wide range of different statistical anonymization methods such as global recoding (grouping of categories), local suppression, randomisation, adding noise, microaggregation, top- and bottom coding. It can also be used to generate synthetic data.
The current version 5.1.3. was last updated on March 2018.
sdcMicro is an R-package that can be used for the generation of anonymized (micro)data. In addition, various risk estimation methods are included. Note that the package includes a graphical user interface that allows to use various methods of this package.
sdcMicro was published in May 2018.
Anonimatron is a tool that pseudonymizes datasets and that can be used to generate pseudonymized production data to find a bug or do performance tests outside of the client’s production environment.
With release of the GDPR, a feature was added that enables the anonymization of files: “You can now configure a column to be anonymized without storing the generated synonyms for later runs.”
The latest release 1.10.0 is from June 2018.
Professional Data Anonymization Software
If you are a data scientist that works with sensitive medical records or transaction data in banking you might need to work with professional anonymization solutions.
One of the biggest benefits from B2B anonymization tools (such as Aircloak Insights) is that they offer GDPR-compliant and interactive anonymization that enables a very high level of data utility for precise analyses.
This is a major distinguishing characteristic to free-to-use anonymization tools: ‘Interactive’ means that the analyst can query the data dynamically via an interface and the anonymization process has only to be setup once.
This simplifies the process of data anonymization and reduces the proneness to errors many times over.