Tillaga að doktorsverkefni: Jón Friðrik Daðason
Language Representation Models for Low and Medium-Resource Languages
Jón Friðrik Daðason ver tillögu að doktorsverkefni sínu við tölvunarfræðideild HR mánudaginn 19. október klukkan 11:00. Vörnin fer fram á Zoom.
---------------------------
Title: Language Representation Models for Low and Medium-Resource Languages
Candidate: Jón Friðrik Daðason
Supervisor: Hrafn Loftsson, Associate Professor, Department of Computer Science, Reykjavik University.
Examiners: Anders Søgaard, Professor in Natural Language Processing and Machine Learning, Department of Computer Science, University of Copenhagen. Sampo Pyysalo, Associate Professor, Language and Speech Technology, University of Turku.
Abstract
--------
Transformer-based language models have proven to be extremely effective for a wide variety of natural language understanding tasks, such as question answering, language inference and sentiment analysis. These models can be pre-trained on large, unanno- tated corpora on unsupervised tasks such as recovering randomly masked tokens. The pre-trained models can then be fine-tuned on more specific tasks in a much shorter amount of time and requiring much less data. Transformer models are highly scalable and have grown exponentially in size since their introduction in 2017. At the same time, the size of pre-training corpora used by state-of-the-art models has grown by several orders of magnitude. More research is needed into how these models can best be applied for low and medium-resource languages, such as Icelandic, which is both morphologically rich and lacking in available pre-training data. We will investigate how different pre-training tasks, subword tokenization algorithms and vocabulary sizes affect the performance of small Transformer models under low and medium-resource settings. Furthermore, we will consider how the size and composition of multilingual corpora affects the performance of multilingual Transformer models for low and medium-resource languages. Finally, we will adapt the mean shift rejection method to the Transformer architecture, which may allow for training deeper models with higher learning rates than previously possible.