Dept. of Computer Science MSc thesis defence in AI and Language - Justyna Micota
Automatic Anonymization of Icelandic Medical Data: Named Entity Recognition for Unseen Text with Fine Tuning on Synthetic Corpora
Join us for a 30 ECTS MSc thesis defence in AI and Language of Justyna Micota on her thesis:
Room: M119, all welcome.
- Defence committee:
Main Supervisor: Stefán Ólafsson, associate Professo, Department of Computer Science, Reykjavik University
- Committee members:
Hrafn Loftsson, associate Professor, Department of Computer Science, Reykjavik University
Páll Rúnarsson, researcher, Department of Engineering, Reykjavik University
Abstract:
In order to better understand Electronic Health Records and improve clinic and patient experience with incorporating generative models, the data needs to be anonymized. Anonymizing a large amount of samples manually is known to be a time consuming task, and therefore automatic solutions are needed. The laws concerning privacy of individuals within medical records are understandably very strict, which causes extremely limited access to data samples for researchers and developers. This work investigates automatic detection of Personally Identifiable Information in unseen dataset using fine tuning of existing models on synthetically generated corpora. The synthetic dataset was used for fine tuning of two transformer based models, an IceBERT fine tuned on synthetic data only, and an IceBERT sequentially fine tuned on a real Named Entity Recognition Icelandic dataset from a different domain first, and on the synthetic dataset afterwards. Results suggest that sequential fine-tuning on clean Named Entity Recognition data is preferable to synthetic-only fine-tuning for the labels that are present in the clean dataset, however both approaches did not achieve scores anywhere close to a target threshold of 0.95 found in literature.
Vinsamlegast athugið að á viðburðum Háskólans í Reykjavík (HR) eru teknar ljósmyndir og myndbönd sem notuð eru í markaðsstarfi HR. Hægt er að nálgast frekari upplýsingar á ru.is eða með því að senda tölvupóst á netfangið personuvernd@ru.is.
Please note that at events hosted at Reykjavík University (RU), photographs and videos are taken which might be used for RU marketing purposes. Read more about this on out ru.is or send an e-mail: personuverd@ru.is.