Tue Apr 17 2018

Want to experiment fast with data, yet be GDPR compliant?

when this blog is meant for you!  Xomnia wants to share its insights on how your organization can still reap the benefits of becoming a data-driven company, yet in a GDPR compliant manner.

The EU General Data Protection Regulation (GDPR) in short:

GDPR: WHY?

GDPR was implemented because of the public concern over privacy. That concern is more topical than ever, as shown clearly by the illegal sharing of data of around 87 million Facebook users' with the dubious data consulting firm Cambridge Analytica.

GDPR: WHAT?

GDPR aims primarily to give back the control over personal data to the citizens and residents. It replaces the EU’s Data Protection Directive, which went into effect in 1995, so well before the Social Media and e-commerce became as dominant as today.

GDPR: WHEN?

The GDPR will be officially enforced beginning May 25, 2018. Ideally, systems and processes to support GDPR changes should be in place well before this deadline.

The main consumer question: Is my data safe with you?

GDPR aims primarily to give control back to EU residents over their personal data. So basically your clients or citizens will have one main question: Is my data safe with you? GDPR was created with the philosophy that customers should be able to trust organizations to keep their data. This includes that you can prove you’re in control, that you safeguard their data, and can prove that customers can trust you. If customers trust you, they’ll be happy to give you their data, and with that data, you can improve your service.

Will GDPR make Data Science impossible?

IN SHORT, NO. But it will be more important than ever to take the right measures to keep the use of Data Science possible. GDPR is basically a regulation that helps to clarify what these right measures are. As a Data Science company, Xomnia has no other option than to understand what measures have to be taken to be GDPR compliant. This blog will focus on some operational and technical measures that you can take as an organization that wants to be data-driven yet GDPR compliant. Organizing all the data processing and predictive modeling activities  that take place within in your organization, could pose a challenge. Especially when these kinds of activities take place decentralized within different departments or business units, it could look like this:

In the situation above the data is everywhere with everyone. It’s not unusual that your new Development and Production environment on Analytics looks like the IT spaghetti of the past. However, this messy environment is not sustainable in GDPR terms. As Xomnia we advise many customers to move away from “stitched together” environments to controlled environments to be able to experiment with data, the so-called Data Labs:

These Data Labs are the central Data Science workspaces that drive fast experimentation in a DevOps manner. All users have access with the proper access controls and all their actions are traced from source to the predictive model. Both data and predictive models are version controlled.

Take the right measures

The Data Lab of your choice should at least have the following functionalities to support your organization’s GDPR requirements:

Have a data inventory

When it comes to the GDPR, organizations will need to take stock of where all the data is stored and ensure that it is accessible, but only to those who have the proper mandate to do so. Managers of Analytical teams should be able to easily understand and audit data sources, while being able to answer the questions: who has access to what? And what sources are being used for which projects? The same goes for Data Protection Officers if they are applicable for your organization under the GDPR. Xomnia advises to select a Data Lab that has a Data Catalog that contains all (by Data Science) projected published data sets. At its best, the Data Catalog keeps a version of the Data Preparation code that created the published data set. In that way, lineage and auditability of data pipelines will be supported. Dataiku supports this functionality.

Have reproducible results

In organizations that use automated decision-making, GDPR creates a “right to an explanation” for consumers. GDPR holds firms accountable for bias and discrimination in automated decisions. Moreover, they may not use specific categories of personal data in automated decisions, except under defined circumstances. Xomnia advices to ensure proper data governance, security, and monitoring are in place in case of audit of your Data Lab. Data Labs like Domino Data Lab have an extremely tight reproducibility engine, being recognized in regulated industries like finance and pharmacy

Prevent data leakages

One of the requirements of the GDPR is that, by using appropriate technical and organizational measures, personal data must be processed in a manner that ensures the appropriate security of the personal data, including protection against unauthorized or unlawful processing and against accidental loss, destruction or damage. Xomnia advices to select a Data Lab that supports Open Source in the most native way. Your new Data Science talent will not be attracted to closed source software only and will find ways to support themselves with cool new Open Source, out of your control, software and even without you being aware of it. Stay tuned for the next blog about GDPR. Check out an in-depth beginners guide to GDPR written by Jack from VPN geeks.