The GDPR and Unstructured Data: Is Anonymisation Possible?

Publication details

Journal: International Data Privacy Law (IDPL), vol. 12, p. 184–206, 2022
International Standard Numbers:
- Printed: 2044-3994
- Electronic: 2044-4001
Links:
- DOI: doi.org/10.1093/idpl/ipac008
- ARKIV: hdl.handle.net/11250/3013219
- ARKIV: hdl.handle.net/11250/2992297
- ARKIV: hdl.handle.net/10852/93692
- ARKIV: hdl.handle.net/11250/2987775

Much of the legal and technical literature on data anonymization has focused on structured data such as tables. However, unstructured data such as text documents or images are far more common, and the legal requirements that must be fulfilled to properly anonymize such data formats remain unclear and underaddressed by the literature.

In the absence of a definition of the term ‘anonymous data’ in the General Data Protection Regulation (GDPR), we examine its antithesis—personal data—and the identifiability test in Recital 26 GDPR to understand what conditions must be in place for the anonymization of unstructured data.

This article examines the two contrasting approaches for determining identifiability that are prevalent today: (i) the risk-based approach and (ii) the strict approach in the Article 29 Working Party’s Opinion on Anonymization Techniques (WP 216).

Through two case studies, we illustrate the challenges encountered when trying to anonymize unstructured datasets. We show that, while the risk-based approach offers a more nuanced test consistent with the purposes of the GDPR, the strict approach of WP 216 makes anonymization of unstructured data virtually impossible as long as the original data continues to exist.

The concluding section considers the policy implications of the strict approach and technological developments that assist identification, and proposes a way forward.