Viðburðir eftir árum


MSc Thesis Defense-Department of Computer Science- Lucas A. E. Pineda Metz

An Evaluation of Unity ML-Toolkit for Learning Boss Strategies

  • 11.9.2020, 14:00 - 15:00

Friday the 11th of September 2020, Lucas A. E. Pineda Mets will defend his 60 ECTS thesis in Computer Science

Candidate: Lucas A. E. Pineda Metz
Supervisors: Dr. Yngvi Björnsson, Professor, Department of Computer Science at RU and Dr. David Thue, Assistant Professor, Carleton University
Title: An Evaluation of Unity ML-Toolkit for Learning Boss Strategies
Date and Time: September 11th at 14:00 in Room M120

Abstract: Accompanying the growing pace of AI research for video games is developing new benchmark environments. One of the most recently introduced environments is Unity's Machine Learning Toolkit (ML-Toolkit). With this toolkit, Unity allows its users (researchers or game developers) to incorporate state-of-the-art Reinforcement Learning or Imitation Learning algorithms or one's own Machine Learning algorithms to train a learning agent. On this project, I used one of the Reinforce Learning algorithms (Proximal Policy Optimization; PPO) alone and in combination with two Imitation Learning algorithms (Generative Adversarial Imitation Learning; GAIL and Behavioral Cloning; BC) provided with Unity's Machine Learning Toolkit. These were used to teach a learning agent how to optimize its policy to maximize its reward by learning how to better choose from a set of attacks to win in a fight against a simpler non-learning agent. The project has two focuses: a) To compare the learning provided by the different algorithms included in Unity's toolkit, and additionally compare the use of the Imitation Learning algorithms as complements of the Reinforce Learning algorithm, and; b) Test the usability of the ML-Toolkit by creating a learning environment to train an agent and compare my experience implementing and training such agent with the information provided by Unity's documentation. To achieve this, I conducted three case studies, one providing a demonstration file containing an optimal policy, one with a sub-optimal policy, and the third one with a mix of both. For all study cases, the learning was done considering four combinations of learning algorithms: a) PPO alone; b) PPO in combination with GAIL; c) PPO in combination with BC, and; d) all learning algorithms. The three case studies' overall results showed successful learning by the agent, regardless of the learning algorithms considered. From the Imitation Learning algorithms, GAIL showed difficulties in learning policies that involved several complex actions, whereas BC greatly increased the learning rate. The results of the project show the advantages and limitations of the use of Imitation Learning algorithms for learning behaviours, the importance of the demonstration provided for the Imitation Learning algorithms, and further discusses the usefulness of entropy as a complementary variable to consider when assessing the success rate of the learning process



Vinsamlegast athugið að á viðburðum Háskólans í Reykjavík (HR) eru teknar ljósmyndir og myndbönd sem notuð eru í markaðsstarfi HR. Hægt er að nálgast frekari upplýsingar á ru.is eða með því að senda tölvupóst á netfangið: personuvernd@ru.is
//
Please note that at events hosted at Reykjavik University (RU), photographs and videos are taken which might be used for RU marketing purposes. Read more about this on our ru.is or send an e-mail: personuvernd@ru.is