Viðburðir eftir árum

MSc Thesis Defense-Department of Computer Science-Svanhvít Lilja Ingólfsdóttir

Named Entity Recognition for Icelandic: Annotated Corpus and Neural Models

  • 15.6.2020, 10:00 - 11:00

Monday the 15th of June 2020,  Svanhvít Lilja Ingólfsdóttir will defend her 60 ECTS thesis in Language Technology.

Candidate: Svanhvít Lilja Ingólfsdóttir
Supervisor: Dr. Hrafn Loftsson, Associate Professor, Department of Computer Science
Title: Named Entity Recognition for Icelandic: Annotated Corpus and Neural Models
Date and Time: June 15th at 10:00 in Room M104

Abstract: Named entity recognition (NER) is the task of automatically extracting and classifying the names of people, places, companies, etc. from text. NER is an important preprocessing step in various different language technology tasks, such as question answering, speech recognition, search engine optimization and data anonymization, but can prove difficult, especially in highly-inflected languages like Icelandic. Named entity recognizers are usually trained on text corpora in which the named entities have been annotated, but no such corpus has been available for Icelandic.

In this thesis, we present the first annotated NER corpus for Icelandic, along with neural models trained on the data. The corpus, containing over 48,000 named entities in one million tokens, was annotated with eight named entity types using a semi-automatic approach, and then manually reviewed. A bidirectional LSTM recurrent neural network was trained on the annotated corpus, using pre-trained word embeddings as external input. We report an F1 score of 83.65% for all eight entity types when trained on the whole corpus.


Vinsamlegast athugið að á viðburðum Háskólans í Reykjavík (HR) eru teknar ljósmyndir og myndbönd sem notuð eru í markaðsstarfi HR. Hægt er að nálgast frekari upplýsingar á eða með því að senda tölvupóst á netfangið:
Please note that at events hosted at Reykjavik University (RU), photographs and videos are taken which might be used for RU marketing purposes. Read more about this on our or send an e-mail: