Icelandic Centre for Language Technology (ICLT) seminar series - Hrafn Loftsson - Correcting a PoS-tagged corpus using three complementary methods

The next talk in the Icelandic Centre for Language Technology (ICLT) seminar series will be given at Reykjavik University, Kringlan 1, room K5, Tuesday January 20th, and starts at 12:00. The speaker is Hrafn Loftsson, Assistant Professor, from Reykjavik University. The title of his talk is "Correcting a PoS-tagged corpus using three complementary methods". The talk will be given in English if someone in the audience does not understand Icelandic.

 

The quality of the part-of-speech (PoS) annotation in a corpus is crucial for the development of PoS taggers. In this talk, we experiment with three complementary methods for automatically detecting errors in the PoS annotation for the Icelandic Frequency Dictionary corpus. The first two methods are language independent and we argue that the third method can be adapted to other morphologically complex languages.

Once possible errors have been detected, we examine each error candidate and hand-correct the corresponding PoS tag if necessary. Overall, based on the three methods, we hand-correct the PoS tagging of 1,334 tokens (0.23% of the tokens) in the corpus. Furthermore, we re-evaluate existing state-of-the-art PoS taggers on Icelandic text using the corrected corpus.

 

Hrafn Loftsson graduated with a BSc degree in Computer Science from University of Iceland in 1989. He received an MSc degree in Computer Science and Operation Research from Pennsylvania State University in

1992 and a PhD in Natural Language Processing from University of Sheffield in 2007. Hrafn is an Assistant Professor in the School of Computer Science at Reykjavik University and sits on the board of the ICLT.


 

Tungumál


Leita




Þetta vefsvæði byggir á Eplica