There is a wide range of data anonymisation tools available.
The selection should be carefully made according to the purpose for which you want to anonymise your data.
The most important questions you have to ask yourself is:
- What programming environment are you using? Does it have to work in a specific one like R?
- What anonymisation methods do you need? Are there any external requirements?
- How accessible and easy to use should the solution be?
There is a big difference between a student who conducts a survey with test persons for a thesis and anonymises the results once, or a data scientist in a bank who regularly analyses customer transaction data.
Open Source Anonymisation Software
For a one-time anonymisation, for example of survey data, static anonymisation is often sufficient. In the list below you can find some open source anonymisation tools. We paid special attention to actuality, so that the latest release or update is not older than one year.
ARX Data Anonymization Tool
The ARX Data Anonymization Tool is a relatively popular open source and cross-platform tool.
It supports different privacy models like k-anonymity (or its variants l-diversity, t-closeness, b-likeness) or Differential Privacy and can be used for up to 50 dimensions (e.g. attributes) and millions of records. It also has a comprehensive graphical user interface.
The latest release is from March 2018.
Amnesia is a data anonymsation tool that has its background at the Athena Research Center. It supports k-anonymity and km-anonymity. Amnesia has an hierarchy creator and editor that allows the user to tailor the anonymisation to find the right balance between privacy and data utility. The installer can also be downloaded on the website.
If you just want to give it a quick spin, there is also an online-version without the need to install anything.
The newest version is from December 2018.
μ-ARGUS is tool designed to create safe micro-data files and is based on the programming language R, which is specifically built to support statistical analyses.
ARGUS stands for ‘Anti Re-identification General Utility System’. The tool uses a wide range of different statistical anonymisation methods such as global recoding (grouping of categories), local suppression, randomisation, adding noise, microaggregation, top- and bottom coding. It can also be used to generate synthetic data.
The current version 5.1.3. was last updated on March 2018.
sdcMicro is an R-package that can be used for the generation of anonymised (micro)data. In addition, various risk estimation methods are included. Note that the package includes a graphical user interface that allows to use various methods of this package.
sdcMicro was published in May 2018.
Anonimatron is a tool that pseudonymizes datasets and that can be used to generate pseudonymized production data to find a bug or do performance tests outside of the client’s production environment.
With release of the GDPR, a feature was added that enables the anonymisation of files: “You can now configure a column to be anonymised without storing the generated synonyms for later runs.”
The latest release 1.10.0 is from June 2018.
Professional Data Anonymisation Software
If you are a data scientist that works with sensitive medical records or transaction data in banking you might need to work with professional anonymisation solutions.
One of the biggest benefits from B2B anonymisation tools (such as Aircloak Insights) is that they offer GDPR-compliant and interactive anonymisation that enables a very high level of data utility for precise analyses.
This is a major distinguishing characteristic to free-to-use anonymisation tools: ‘Interactive’ means that the analyst can query the data dynamically via an interface and the anonymisation process has only to be setup once.
This simplifies the process of data anonymisation and reduces the proneness to errors many times over.