Analysing the Efficacy of Evaluation Metrics for Data Privacy Preservation with Textual Data

Publikasjonsdetaljer

Arrangement: (Kristiansand)
År: 2024
Arrangør: NORA

Data privacy is an important facet of modern life. It is especially important when considering data that carries potentially sensitive information such as in medical or legal documents. However, it is particularly difficult to ensure private information has been removed or masked in unstructured data, e.g. free-flowing text. This is a major concern in the current large language model paradigm in natural language processing where massive amounts of data is processed by these models without any level of scrutiny as to what the data actually contains. To this end, many practitioners have attempted to automatically detect and remove personal identifiable information (PII) from documents using a variety of different methods.
Most approaches used to detect PII are based on methods that require labelled data to train a machine learning model to classify words or spans of words as PII. Recently a new method captured the disclosure risk by characterizing the semantic relationships between entities in a document based on word embeddings, without the requirement of manual tagging data [1].
There are two main concerns associated with the performance of these methods. The first is to what degree a given method ensures privacy protection. The second is what level of utility is preserved in the document after the PII have been removed. The most straightforward and standard metrics used to evaluate these methods are recall for privacy protection and precision for utility preservation. However, such empirical evaluation can be misleading. We evaluate the method in [1] as a case study and show that these evaluation metrics fail to give us a complete picture of the model’s behavior and performance.
We present an analysis showing why the proposed method is not as effective as it seems when using these metrics for evaluation and discuss how better to use them. We also discuss how to improve evaluation by using other metrics, building on the work presented in [2].
Reference Style
[1] Fadi Hassan, David Sánchez, and Josep Domingo-Ferrer (2023) “Utility-Preserving Privacy Protection of Textual Documents via Word Embeddings” IEEE Transactions on Knowledge and Data Engineering Volume 35, Number 1: page 1058 – page 1071.
[2] Ildikó Pilán, Pierre Lison, Lilja Øvrelid, Anthi Papadopoulou, David Sánchez, and Montserrat Batet (2022) “The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization” Computational Linguistics Volume 28, Issue 4: page 1053 – page 1101.