Analytics and Privacy

Here you can find a comprehensive guide to build a modern privacy-preserving analytics stack. It is targeted mainly at data analysts and senior managers who want to better understand how to comply with the requirements imposed by GDPR while still leveraging data analysis.

According to some commentators, data scientists spend up to 80% of their time on housekeeping tasks like data location, organisation, cleaning and de-duplication and only 20% on actual data analysis. While this 80/20 rule is based on observation rather than fact, it is certainly the case that such ancillary tasks can end up taking a disproportionate share of a data analyst’s time. When you then add stringent requirements for data privacy it can only become worse.

Analytics is a widely used term in the modern world. And like many such terms, different people mean different things when they use it. In this paper we’re going to define it in fairly broad terms as the process of taking a large (Nowadays people often refer to “big data” when they are talking about analytics. While there is a close linkage, the overwhelming majority of datasets don’t really justify the big data sobriquet.), often diverse, dataset and extracting valuable insights that can be used as part of your business intelligence and planning. The important aspect here is that the analysis should generate valuable and useful data.

This paper focuses on how to create the best privacy-preserving data analytics stack.

That means one that will help increase productivity, reduce the time lost in data housekeeping while guaranteeing data privacy. Our aim is to give data analysts a firm understanding of some important concepts in data privacy and security, along with explaining how these interact with the actual analytics stack.

Does Data Protection Always Have to Be Tedious?

Data privacy has always been important, especially for companies dealing with individuals and the general public. You only have to look at significant data breaches such as when hackers stole the details of almost 150 million Equifax customers, to see the enormous damage a company can suffer. That breach is estimated to have cost them $60-75 million in lost profits as well as impacting future sales. Since the General Data Protection Regulations (GDPR) became law in April 2018, data privacy has assumed increased significance for any company who deals with customers who are EU citizens or residents. As a result of this, data privacy and security often end up taking a significant share of an analyst’s time.

Through the course of this paper, we will explain the constituent parts of an analytics stack, explain the requirements for data privacy and security, explore how anonymization can help achieve these and then look at how to choose the right tools for you analytics project.

Overall the aim is to show you that with the correct analytics setup you can claw back some of your lost productivity and spend your time on actual data analysis rather than data housekeeping tasks.