Studies
Research
About RU

Dept. of Computer Science MSc project defence - Björgvin Freyr Jónsson

A Comparative Study of Machine Learning Methods for API Call-Based Malware Classification
30. May, 09:00 - 10:30
Háskólinn í Reykjavík - stofa M104
Add to calendar

Dept. of Computer Science MSc project defence - Björgvin Freyr Jónsson

A Comparative Study of Machine Learning Methods for API Call-Based Malware Classification

Join us for a MSc project defence of Björgvin Freyr Jónsson on his thesis: A Comparative Study of Machine Learning Methods for API Call-Based Malware Classification.

Main Supervisor: Jacky Mallett, Assistant Professor, Reykjavik University

Committee members: 

  • Geir Olav Dyrkolbotn, Associate Professor, NTNU
  • Stefán Ólafsson, Assistant Professor, Reykjavik University

Abstract:

As malware becomes ever more sophisticated and older detection methods fail to keep up, new detection approaches are urgently needed. This thesis conducts a detailed survey of the existing literature on malware classification for Windows API calls. Building on this we then systematically compare the performance of a wide variety of traditional machine learning algorithms and some deep learning algorithms, such as Deep Feedforward Neural Networks and Transformers. We evaluate order-agnostic methods, e.g., Bag-of-Words and Term Frequency-Inverse Document Frequency (TF-IDF) vectors, and some order-sensitive methods like API call sequences using Transformers and n-gram TF-IDF vectors. We introduce two novel approaches: one is incorporating the API call arguments into the feature set by encoding call arguments into API calls, and the other is training a transformer-based classification model using a longer token limit than has been done in the literature we surveyed. Incorporating the arguments into the feature set showed improvements over other methods. Compared to an API call-only dataset with both datasets using Bag-of-Word representation, incorporating the arguments into the feature set in- creased the F1 macro score by 2.51%. Training a transformer-based model with a higher token limit than those reported in the existing literature did not yield desirable results and was significantly inferior to the other methods we evaluated.

Vinsamlegast athugið að á viðburðum Háskólans í Reykjavík (HR) eru teknar ljósmyndir og myndbönd sem notuð eru í markaðsstarfi HR. Hægt er að nálgast frekari upplýsingar á ru.is eða með því að senda tölvupóst á netfangið personuvernd@ru.is.

Please note that at events hosted at Reykjavík University (RU), photographs and videos are taken which might be used for RU marketing purposes. Read more about this on out ru.is or send an e-mail: personuverd@ru.is.

Go to top