CLEANUP - Machine learning text anonymization

The goal of CLEANUP is to develop new machine learning methods to automatically anonymise text documents with personal data, such as electronic health records, court decisions or chat based interactions with customers.

The main idea of the project is to combine approaches from natural language processing and privacy to design a new generation of anonymisation techniques.

The purpose is to modify text documents in a way that prevents the disclosure of personal information, while preserving the internal context and semantic content of the documents.

One of the method we are testing is text sanitization, the task of redacting a document to mask all occurrences of (direct or indirect) personal identifiers, with the goal of concealing the identity of the individual(s) referred in it.

Partners

The project brings together researchers from machine learning, natural language processing, data protection, statistical modelling, health informatics and IT law.

In addition, partners from the Norwegian public and private sectors (which cover insurance, welfare, health services and legal publishing) contribute to the project with computer and domain knowledge.

To learn more about this project, please contact:

Pierre Lison

Chief Research Scientist

Project: CleanUp-project

Partners: The Faculty of Law and the Department of Informatics at University of Oslo, the Norwegian University of Science and Technology, University of Rovira i Virgili, DNB, Norwegian Labour and Welfare Administration, Gjensidige, Lovdata, Norsk Helsearkiv

Funding: Research Council of Norway

Period: 2020 – 2024

Project website for partners

Automatically anonymising text documents (CLEANUP)

Partners

To learn more about this project, please contact: