See the most up-to-date project information on my [Socially Expressive Computing Group] page
Social game environments include for example online games that rely on social interaction among players, games where players have to converse with non-player characters to further a plot, or many of the so-called "serious games" where game environments are being used for training real social skills. Articulated humanoid characters typically represent the participants in these social interactions, regardless of whether they are agents under the control of the game or avatars being controlled by human players. These characters need to exhibit believeable nonverbal social behavior both to maintain the illusion of life and to communicate effectively. Building on our previous work on embodied conversational agents and avatars, we are looking at ways to bring social nonverbal behavior control into games through modular mechanisms that embody general social rules.
This is an international effort to unify a multimodal behavior generation framework for Embodied Conversational Agents (ECAs). We are proposing a three stage model we call SAIBA where the stages represent intent planning, behavior planning and behavior realization. A Function Markup Language (FML), describing intent without referring to physical behavior, mediates between the first two stages and a Behavior Markup Language (BML) describing desired physical realization, mediates between the last two stages.
We present a research study on the simultaneous observation of several features as a means for the recognition of human activities; in particular we refer to high-level daily activities performed in a home environment. We abstract from techniques and modalities for observations retrieval, assuming each feature to be easy to track from the domain environment. No mechanisms for further probabilistic recognition of lower-level activities should be used: we recognize activities directly from information extracted from the features. In our work we introduce the concept of “observational channels”, each channel representing an abstract feature, as organized sources of information from the environment. We refer to real-world scenarios in which by definition not all information is accessible, meaning that we deal with partial knowledge: only information from plausible sources should be considered. In our study we use several Hidden Markov Models, a standard tool being widely used in such applications, for a feature-based recognition; we compare the recognition rates resulting from partial or total observation of available features. In our setting we focus on a typical domain from daily life, proving that the use of multiple features could be beneficial for the recognition; however the observation of certain combinations of features tend to be confusing for recognizers, which in some cases do not offer a minimally acceptable recognition rate. We conclude with a discussion about the results we obtained, possible improvements and refinements to the approach, and future works.
In few years context-aware computing will pervade almost every aspect of our lives. One of the crucial issues related to this field is to have a proper and convenient model to represent and manage the context. Existing representation models like ontologies constitute a well researched and mature solution. However, they are not made to represent continuously changing data; moreover, building and maintaining them is a highly error-prone and time consuming process, and it can become a tedious and non-scalable task if it is done manually. This thesis proposes a model to represent and manage contextual information of different types, generated by a variety of heterogeneous sources and with different levels of granularity. The model is derived from the integration of Semantic Networks with the Object Oriented software development model and has been implemented by exploiting the use of scripting languages and their properties, such as dynamic typing, meta-programming, and introspection. A context-aware infrastructure (CAFE) based on this model is presented and, by showing an illustrative contextual scenario implemented in CAFE, it is demonstrated that the proposed model guarantees high readability, flexibility, scalability, general-purpose, and modularity.
Daily-life activities at home can generate dangers that may lead to accidents. Risky situations may be difficult to notice by people with a cognitive or physical impairment. Therefore, recognizing dangers is very important so as to assist users in preventing accidents, and ensure their health, safety and wellbeing. The present thesis aims to design a system that, given a representation of the environment as input, learns how to evaluate states according to their danger level, and is able to alert and prevent users from getting too close to a potential danger. We explore the search space for dislosing dangers and finding a safe path leading to the goal. The project led to the implementation of a working prototype, which is able to suggest the best action to perform, and reports the level of danger and an estimation of the last performed action. Also, it is able to warn the user when the level of danger exceeds a given threshold. We offer a general solution, as the system is able to play arbitrary games described with the Game Description Language, and performs on-line planning by means of the Q(lambda) algorithm. For this purpose, we implemented a Java library for implementing TD-learning agents. In addition, we defined the concept of sphere of protection and we disclose dangers by using a variant of breadth-first search. Finally, we exploited virtual environments as a general testbed for simulating effects of warning notifications and we showed how the system can be used to perform informal user testing, thus for evaluating the effects of warning notifications on actual users.
Entertainment, education and training increasingly take place within virtual environments populated by virtual characters. For the environments to have full impact, the behavior of the characters need to be consistent, both with regards to the unfolding story and with regards to social norms. My study is about the simulation of human behaviors in order to make virtual characters believable in virtual social environments. In particular during my research, I will show several social phases related to \non-verbal behaviors" during an initial meeting between persons and an ongoing conversation. I will also display emotional states such as anxiety and worry. In other words, my thesis is about the realization of both emotional and social behaviors in a virtual environment. The idea is to realize a realistic scene in a bar with people inside. They will undergo dierent moods when a particular social event occurs. The work uses Impulsion, an articial intelligence engine for virtual characters and particularly suitable for simulating social intelligence. This framework already manages important aspects of behaviors, such as the social perception of other agents (Non Player Character), locomotion, collision path to avoid obstacles (and other agents) and a flexible behavior engine based on continuous steering of body joints. The purpose of this research is to study particular social situations and automatically generate realistic behavior in reaction to other agents. In order to simulate credible behavior, it is important to collect real data on how people respond socially and emotionally to others and how people try to establish and avoid social interaction according to their internal state. It is also important to research how it is possible to represent this behavior in a virtual environment. Ultimately we want to see how well these simulated behaviors reect reality, and see how well human users respond to a running demo.
It is increasingly common that we engage in face-to-face conversations with people of different cultures as part of our daily lives. The results of such cross-cultural communication is sometimes affected by misunderstandings that arise from culturally different interpretations of the same message. This thesis focuses on the different interpretations of non-verbal behaviors, gestures in particular. The Automated Culture Training (ACT) system addresses the problem of misunderstanding in cross-cultural communication by providing an interactive 3D training environment, where people can quickly pick up cultural knowledge and apply it in a series of simulated social settings. Each lesson involves an interaction between the learner and an automatic character of a given culture. During the interaction, the learner chooses what gesture to use at any given moment, and the character gives immediate positive or negative feedback. The contribution of this thesis is a modular technical framework for the ACT system based on a clear abstraction between communicative behavior (the visible action) and the communicative function (the interpretation). Furthermore, the framework keeps a clear saparation between data and its processing, for example by treating the cultural description and the description of each exercise purely as input data. The automatic character incorporates a complete perception-action loop, which allows it to dynamically react to learner input. The result is a fully functional prototype that demonstrates best-practice engineering principles and is ready for further development of content and testing with users.
This research presents a multimodal non-verbal conversation analysis performed on a typical institutionalized political TV interview. The focus is on facial, hand, and body gestures and their role in carrying out communicative functions such as feedback and how speakers know when it is their turn to speak. What we wanted to know is what non-verbal gesturers speakers of institutionalized interviews use and also, importantly how these gestures compare between cultures. This work is based on previous studies done in Greece and similar investigations in Europe for comparison between different cultures.
As environments in games have become more and more realistic graphically, a new big challenge has emerged for gaming companies. Both the computer controlled agents and player controlled avatars now also need to act in a believable manner so that the illusion of reality created by exquisite graphics and physics is not broken. There is a requirement for agents to react to the environment around them as well as to act in a certain manner when introduced to social situations. For agents to be life-like they need to have basic human traits like emotions and the ability to make decisions. In this thesis we describe a way of tackling this challenge using a three pronged solution. We have incorporated a social norms model into the agents using social rules. These rules tell the agents how to act when engaged in social situations. Secondly we added an emotional model which affects the agents' emotional state and gives them the ability to vary their responses to situations in the environment. Both these models reside in an appraisal module that is based on emotional appraisal theory. The appraisal module will appraise how events triggered in the environment affect the agents both emotionally and socially and will give the agent instructions on how he might cope with that situation. To complete the cycle a planner will make the decisions on what can be done, what should be done and how it should be done. The resulting system provides the illusion that the agents are human-like individuals that can act differently to similar situations depending on what they think is important to them at that time.
Path planning consists of finding a route from one location point to another. It is also known as motion planning or navigation problem and it is common in several fields in Computer Science and commercial games since it has to be solved in real time, under constraints of memory and CPU resources. Abstracting the problem, path planning is divided in two main sub-problems: path finding, used to find a simplified route composed of connected segments and path following to make the locomotion along the route as realistic as possible. The path following should avoid contact between objects through the collision avoidance; together, they represent the main topic of our studies. We work on a virtual environment populated by agents and avatars that are engaged in social situations and show autonomous believable social behaviors. In this context, natural and fluid paths are the keys to providing a high level of presence and realism. Building on several theories about human territories, social forces and body language, we are extending the state of the art of path following; each agent is now able to avoid static and dynamic obstacles along its path, to predict future interaction patterns of others and accordingly to apply corrections to its movement. We model spontaneous form of non-verbal negotiation that humans engage in every day during their locomotion. Their behavior is also slightly affected by stochastic factors that appear under the form of little changes on velocity and/or orientation, called distortions. Our approach is able to solve path following and collision avoidance in many types of situations, finding a good balance between realism and performance. To test this, we implemented several scenarios of completely different simulations. Moreover, creating a specific profile for each agent, our system can also show how different stereotypes of people act in those situations and compare them with expected results.
Every avatar and agent in a social game environment needs to show social intelligence by reacting spontaneously to the social environment and situation. This work focuses on one specific behavior in a specific situation: gaze behavior of people waiting for a bus. This work makes use of field studies of subjects that are idling alone at a bus stop to minimize the external factors influencing the behavior. "Alone" here refers to people surrounded by other persons but not in communications with them. Data collected during field studies was used to derive computational models that then were added to automated reactive behavior in the CADIA Populus social simulation environment.
Avatars and agents in a realistic virtual environment must exhibit a certain degree of presence and awareness of the surrounding, reacting consistently to unexpected contingencies and contextual social situations. Unconscious reactions serve as evidence of life, and can also signal social availability and spatial awareness to others. These behaviours get lost when avatar motion requires explicit user control. This thesis presents new AI technology for generating believable social behaviour in avatars. The focus is on human territorial behaviours during social interactions, such as during conversations, gatherings and when standing in line. Driven by theories on human face-to-face interaction and territoriality, we combine principles from the field of crowd simulators with a steering behaviours architecture to define a reactive framework which supports avatar group dynamics during social interaction. [PDF]
Gaze patterns are one of the most expressive aspects of human outward behavior, giving clues to personality, emotion and inner thoughts. Multimodal communication and, generally, natural social behaviors are one of the main branches that Sociology deals with. “Virtualization” of some aspects of life through realistic looking interactive environments is one of the most fervent activity that Computer Science deals with. When games wish to use avatars to represent players in virtual environments, all the animated behaviors that normally support and exhibit social interaction become important. Since player cannot be tasked with micro-management of behavior, the avatars themselves have to exhibit a certain level of social intelligence. The purpose of this avatar AI is infact twofold: to give players helpful cues about the social situation and to ensure that the whole scene appears believable and consistent with the level of game world realism.The goal of this thesis is to automate the production of naturally looking idle-gaze behavior in avatars, using data from targeted video studies. The focus is on people walking alone down a shopping street. [PDF]
The objective of these series of projects was to develop tools to support individualized language learning, and apply them to the acquisition of the linguistic, gestural, and cultural knowledge and skills necessary to accomplish specific tasks in a foreign environment. In order to maximize learner motivation and give learners effective practice opportunities, learners could enter simulated missions where they interacted verbally and nonverbally with virtual characters.The toolset was built so that it applied easily to new languages, missions, and training contexts. The first target languages were Levantine Arabic, Iraqi Arabic and Pashto. This work is ongoing at ISI and Alelo Inc.
In this project we were exploring ways to support learning strategies that are not typically supported in highschool math education. We hoped that this would help a wider population of kids to realize their potential when it comes to math. The work centered on an AI enhanced multi-media environment based on the Wayang Outpost system built by Carole Beal and her team while she was at University of Massachusetts. Our new version extended that system by adding features that supported and promoted online collaboration, personal reflection and mentoring through role models. This work is ongoing at ISI.
This project, internally called the BCBM (Believable Communicative Behavior Middleware), built on my previous work in generating communicative behaviors for animated characters. The goal was to create a specialized middleware or engine that could be plugged right into simulation environments such as games to bring characters to life through believable embodiment. In contrast to earlier work, I was looking at a wide range of contextual factors that might influence the nonverbal conduct, not just interactional and propositional factors. These included attitudes, roles and emotion.
As part of my dissertation I developed a theoretical framework for automatically analyzing text messages in terms of communicative function, and generating supporting nonverbal behaviors in avatars, representing conversation participants in a chat environment. Communicative functions of interest included emphasis, visual reference, turn-taking, attention and feedback. A general architecture, Spark, was built on this framework, demonstrating the approach in an actual system design. MapChat, a derived application for online collaboration, provided empirical evidence for the strength of the approach.
The Behavior Expression Animation Toolkit (BEAT) allows animators to input typed text that they wish to be spoken by an animated human figure, and to obtain as output appropriate and synchronized nonverbal behaviors and synthesized speech in a form that can be sent to a number of different animation systems. The nonverbal behaviors are assigned on the basis of actual linguistic and contextual analysis of the typed text, relying on rules derived from research into human conversational behavior. This toolkit has been distributed, and is being employed, by several university and industry research groups around the world.
Similar to its predecessor BodyChat, Situated Chat automatically animated the visual representations (avatars) of participants of online graphical chat. While BodyChat concentrated on the use of a social model to animate appropriate social behavior such as turn-taking, greetings and farewells, Situated Chat also built a model of the discourse context, taking into account the shared visual environment, and then used that to automatically generate nonverbal behavior of propositional nature such as referring gestures. Again the input was only the chat itself.
MACK was an embodied conversation agent (ECA) who could answer questions about and give directions to the MIT Media Lab's various research groups, projects, and people. MACK used a combination of speech and gesture to communicate with users and could share with them a physical paper map placed in front of him. Research issues involved modeling shared reference and attention and how the different modalities could be fused in both input and generation of behavior.
Imagine an animated character who could participate in children's play with real toys, such as puppets and toy castles, and construct stories with them collaboratively. Bringing together work on story-listening systems for children and embodied conversational agents, this project involved a conversational character, Sam, who could act as a peer playmate to children and could create stories with them by sharing physical objects across the boundary between the real and virtual worlds, and by listening and reacting to the child's input.
This version of BodyChat focused on multi-party chat conversation and automated gaze and basic hand motion in avatars. The only input was the text typed by users, no avatar control was needed. For example, when delivering a message, your avatar first averted gaze ("planning phase"), then gestured while speaking and finally looked at the next speaker. The next speaker was chosen to be the one the user took the turn from, unless another user requested the turn by starting to type or if the speaker explicitly named a new speaker.
Pantomime is an animation engine that takes care of rendering a 3D world inhabited by a variety of dynamic graphical objects. Typical objects include some scenery props and one or more animated characters. The project focused extensively on the expressiveness of the characters and the real-time requirements of face-to-face interaction. A plug-in model for motor skills allows developers to implement a wide variety of approaches to animating articulated characters within the same framework. Pantomime has been distributed to several industry and university research groups.
A successor of Gandalf, REA was an autonomous agent capable of having a real-time face-to-face conversation with a human. The agent was full-size and communicated using both verbal and non-verbal modalities. Speech recognition and computer vision provided multimodal input. The agent played the role of a real estate salesperson, showing users around virtual properties, attempting to sell them a house. REA’s responses were fully synthesized -- including speech and accompanying hand gestures -- based on intent, grammar, lexicon and communicative context.
Since a lot of the nonverbal communicative behavior we exhibit during a conversation is spontaneously and even involuntarily produced, such behavior is lost when a user has to explicitly animate their avatar in graphical chat through the use of buttons and menus. BodyChat was the first graphical chat system that allowed users to communicate via text while their avatars automatically animated attention, salutations, turn taking, back-channel feedback and facial expression, as well as simple body functions such as the blinking of the eyes.
This was a part of the Literary Salon project, where the goal was to explore how technology could encourage people, sharing a physical place, to meet each other through verbal play or other conversational activities. ShadowTalk provided spontaneous performance between tables in a cafe by allowing guests to send a "shadow" across the walls to other tables to initiate a conversation with other people's shadows. Messages typed at a table were delivered by an animated shadow in a natural fashion, using rules of social interaction to generate appropriate non-verbal cues.
Gandalf, one of the very first embodied conversational agents (ECA) brought to life, was the creation of Dr. Kris Thorisson. Gandalf was an expert on the solar system, and by interacting with people through speech and gesture, could answer their questions face-to-face and even take them on a visual tour by commanding a wall-sized graphical projection. Gandalf was based on a computational model of psychosocial dialogue expertise, that bridged perceptual analysis of multimodal events and multimodal action generation. As the first ECA project I worked on, I got the honor of animating Gandalf's hand.