CADIA's AI Lecture Series - Mary Felkin - Learning by Observation and Induction: The Strategies of Humans Placed in a Problem-Solving Context
Where: K5 Kringlan 1
When: Wednesday 26.11.08 from 12-13
Abstract:
Current agent architectures are often built on the model of human cognitive processes such as they are described by cognitive scientists. The lower level corresponds to reactive processes, such as reflexes, only dealing with the immediate situation; the middle level corresponds to deliberative reasoning and as such can also deal with the past and the future; the top level corresponds, among other mental tasks, to even more long-term strategical decision making which leads to goal generation. The top-level decisions of artificial agents are often not made by themselves: their goals are given to them by the programmer or by a global control structure in some multi-agent systems.
Learning can occur at any or all levels of the control architecture of an artificial agent, but learning by watching a human enact a solution, leaning by imitation, only concerns the lower levels of this control architecture. The agent can learn what is being done by the human demonstrator (perception-action pairs) and, to some extent, how it is being done. To the best of our knowledge no one had yet attempted to make an agent learn human topmost level cognitive processes i.e. learn why is something being done. We do this to the extent that the "why" we talk about here is by no means the philosophical or the metaphysical "why", but the practical "why" which answer springs from relevant links between objects and actions. A trivial example would be "Why is the human demonstrator pouring coffee here and not elsewhere? - Because here the cup is just underneath the thermos flask opening". Intelligent behaviour comes from learning this link between thermos bottle and cup.
As human beings are our only models of a fully functioning intelligence, it was important to find a method enabling this topmost level of an agent control architecture to learn its strategies from these of a human demonstrator.
Learning this topmost level cognitive process, learning human strategies, was achieved from video footage of humans in a problem-solving situation. In a series of psychological experiments, blindfolded human volunteers explored a maze in search of a treasure and, while doing so, expressed their search strategy by sequences of perception-actions pairs which were recorded (perception here was limited to touch, which could be seen on the videos).
The volunteers in the mazes had several different goals which they combined through some thought process akin to multi-criteria optimisation to mentally construct and evaluate their behaviours. On top of their given goal, finding the treasure, their overall strategies included the goals of not getting lost, of not exploring the same place twice, of not bumping into obstacles, etc. They mentally evaluated their strategies, as can be seen by the fact that they sometimes changed them during the course of their run. Detecting strategy changes is an important aspect of this work because it is a way of achieving strategy failure recognition.
The task of our simulated humanoid robot was to learn and re-enact these human strategies. We were not interested in the performance of such or such human strategy, we were interested in how they could be learnt.
The gap between human strategies and perception-action pairs is too wide to be bridged in a single learning step. We followed cognitive science architectural models of the human cognitive processes to gradually increase the complexity of what was being learnt, from the observable perception-action pairs which constituted our raw data, to meaningful sequences of them which constituted basic actions, and onwards to tactics and strategies. The success of our attempt contributes to the validation of these cognitive science models.
The number of possible perception-action pair combinations grows exponentially with the number of raw descriptors and linearly with the number of time steps taken in consideration. We showed that attribute selection can be used to learn which are the meaningful sequences to use as the intermediate building bricks between human perception-action pair and human tactics and how to combine these bricks to build a working definition of what a human tactic is (in our given context).
For the final step, from tactics to strategies, we found that a human strategy could be given a mathematical formulation, and that each strategy could be expressed by a statistical combination of tactics. Being able to give a mathematical definition of a human strategy, even in a limited context, is a step towards building an engineering blueprint of human intelligence.
Our method oversteps the correspondence problem. For validation purposes we wanted the robot to display a recognisably human-like behaviour, so the robot had to be a humanoid, but, given the statistical expressions of the human strategies, a virtual octopus could just as well enact them. All that would be needed would be a translation of the building bricks in which "sweep the table top with your hand" would become "sweep the table top with your tentacle" and such like.
Our virtual humanoid robot successfully implemented the human strategies by following the statistical expressions describing them. It also implemented an average strategy which was a combination of the strategies of all the human volunteers. When doing so, it enacted a efficient sweeping behaviour in all mazes (sweeping, i.e. going over the maximum amount of ground space while avoiding as much as possible to go over the same place several times, is a robot test-bed task which is notoriously difficult to code or learn once and then enact in mazes of different sizes and shapes). It was less efficient at avoiding going round the same obstacle twice, though it rarely did it more than twice. But then, some human volunteers had done it too.
Without a single line of code explicitly written for the purpose of programming efficient sweeping or the avoidance of repetitively circling obstacles, and although our purpose was for the robot to learn human strategies without any evaluation of these strategies, we obtained a good treasure-hunter able to perform an efficient search in all the mazes it was tested in.
We showed that the framework of our method was valid, but only on one particular type of problem. The method itself will have to be adapted in order to be used to learn the strategies of humans in other problem-solving situations.

