Select Current Projects

Humanoid Agents and Avatars in Social Game Environments

Social game environments include for example online games that rely on social interaction among players, games where players have to converse with non-player characters to further a plot, or many of the so-called "serious games" where game environments are being used for training real social skills. Articulated humanoid characters typically represent the participants in these social interactions, regardless of whether they are agents under the control of the game or avatars being controlled by human players. These characters need to exhibit believeable nonverbal social behavior both to maintain the illusion of life and to communicate effectively. Building on our previous work on embodied conversational agents and avatars, we are looking at ways to bring social nonverbal behavior control into games through modular mechanisms that embody general social rules.

The SAIBA Multimodal Generation Framework

This is an international effort to unify a multimodal behavior generation framework for Embodied Conversational Agents (ECAs). We are proposing a three stage model we call SAIBA where the stages represent intent planning, behavior planning and behavior realization. A Function Markup Language (FML), describing intent without referring to physical behavior, mediates between the first two stages and a Behavior Markup Language (BML) describing desired physical realization, mediates between the last two stages.

Updating and improving BEAT

The BEAT toolkit (see past projects below) has received a lot of interest, both from animators who are looking for ways to speed up the process of animating dialogues and from researchers and educators who wish to experiment with co-verbal behavior generation rules for embodied conversational agents. While BEAT has been available upon request, it relied on some commercial compenents and was not easy to set up, which prevented its use in many cases. Therefore I am looking at ways to make BEAT completely stand-alone, while also updating the infrastructure to comply with the latest developments in multimodal generation, including SAIBA above. New behavior generators are in the works, including an improved intonation generation.

Select Past Projects

ISI 2003-2006: Games for Language and Culture Training
With W. Lewis Johnson and a large team

The objective of these series of projects was to develop tools to support individualized language learning, and apply them to the acquisition of the linguistic, gestural, and cultural knowledge and skills necessary to accomplish specific tasks in a foreign environment. In order to maximize learner motivation and give learners effective practice opportunities, learners could enter simulated missions where they interacted verbally and nonverbally with virtual characters.The toolset was built so that it applied easily to new languages, missions, and training contexts. The first target languages were Levantine Arabic, Iraqi Arabic and Pashto. This work is ongoing at ISI and Alelo Inc.

ISI 2004-2006: Tailored Math Tutoring
With Carole Beal, Erin Shaw and team

In this project we were exploring ways to support learning strategies that are not typically supported in highschool math education. We hoped that this would help a wider population of kids to realize their potential when it comes to math. The work centered on an AI enhanced multi-media environment based on the Wayang Outpost system built by Carole Beal and her team while she was at University of Massachusetts. Our new version extended that system by adding features that supported and promoted online collaboration, personal reflection and mentoring through role models. This work is ongoing at ISI.

ISI 2004-2006: Believable Communicative Behavior
With Stacy Marsella, Walter Warwick and team

This project, internally called the BCBM (Believable Communicative Behavior Middleware), built on my previous work in generating communicative behaviors for animated characters. The goal was to create a specialized middleware or engine that could be plugged right into simulation environments such as games to bring characters to life through believable embodiment. In contrast to earlier work, I was looking at a wide range of contextual factors that might influence the nonverbal conduct, not just interactional and propositional factors. These included attitudes, roles and emotion.

MIT 2002-2003: Spark (Ph.D. Thesis)

As part of my dissertation I developed a theoretical framework for automatically analyzing text messages in terms of communicative function, and generating supporting nonverbal behaviors in avatars, representing conversation participants in a chat environment. Communicative functions of interest included emphasis, visual reference, turn-taking, attention and feedback. A general architecture, Spark, was built on this framework, demonstrating the approach in an actual system design. MapChat, a derived application for online collaboration, provided empirical evidence for the strength of the approach.

MIT 2000-2003: BEAT
With Justine Cassell and Tim Bickmore

The Behavior Expression Animation Toolkit (BEAT) allows animators to input typed text that they wish to be spoken by an animated human figure, and to obtain as output appropriate and synchronized nonverbal behaviors and synthesized speech in a form that can be sent to a number of different animation systems. The nonverbal behaviors are assigned on the basis of actual linguistic and contextual analysis of the typed text, relying on rules derived from research into human conversational behavior. This toolkit has been distributed, and is being employed, by several university and industry research groups around the world.

MIT 2000-2001: Situated Chat

Similar to its predecessor BodyChat, Situated Chat automatically animated the visual representations (avatars) of participants of online graphical chat. While BodyChat concentrated on the use of a social model to animate appropriate social behavior such as turn-taking, greetings and farewells, Situated Chat also built a model of the discourse context, taking into account the shared visual environment, and then used that to automatically generate nonverbal behavior of propositional nature such as referring gestures. Again the input was only the chat itself.

MIT 2000: MACK
With Justine Cassell and team

MACK was an embodied conversation agent (ECA) who could answer questions about and give directions to the MIT Media Lab's various research groups, projects, and people. MACK used a combination of speech and gesture to communicate with users and could share with them a physical paper map placed in front of him. Research issues involved modeling shared reference and attention and how the different modalities could be fused in both input and generation of behavior.

MIT 1999-2000: Sam
With Justine Cassell and team

Imagine an animated character who could participate in children's play with real toys, such as puppets and toy castles, and construct stories with them collaboratively. Bringing together work on story-listening systems for children and embodied conversational agents, this project involved a conversational character, Sam, who could act as a peer playmate to children and could create stories with them by sharing physical objects across the boundary between the real and virtual worlds, and by listening and reacting to the child's input.

MIT 1999:BodyChat II

This version of BodyChat focused on multi-party chat conversation and automated gaze and basic hand motion in avatars. The only input was the text typed by users, no avatar control was needed. For example, when delivering a message, your avatar first averted gaze ("planning phase"), then gestured while speaking and finally looked at the next speaker. The next speaker was chosen to be the one the user took the turn from, unless another user requested the turn by starting to type or if the speaker explicitly named a new speaker.

MIT 1997-2003: Pantomime
With Kenny Chang

Pantomime is an animation engine that takes care of rendering a 3D world inhabited by a variety of dynamic graphical objects. Typical objects include some scenery props and one or more animated characters. The project focused extensively on the expressiveness of the characters and the real-time requirements of face-to-face interaction. A plug-in model for motor skills allows developers to implement a wide variety of approaches to animating articulated characters within the same framework. Pantomime has been distributed to several industry and university research groups.

MIT 1997-1999: Rea
With Justine Cassell and team

A successor of Gandalf, REA was an autonomous agent capable of having a real-time face-to-face conversation with a human. The agent was full-size and communicated using both verbal and non-verbal modalities. Speech recognition and computer vision provided multimodal input. The agent played the role of a real estate salesperson, showing users around virtual properties, attempting to sell them a house. REA’s responses were fully synthesized -- including speech and accompanying hand gestures -- based on intent, grammar, lexicon and communicative context.

MIT 1997-1998: BodyChat I (M.S. Thesis)

Since a lot of the nonverbal communicative behavior we exhibit during a conversation is spontaneously and even involuntarily produced, such behavior is lost when a user has to explicitly animate their avatar in graphical chat through the use of buttons and menus. BodyChat was the first graphical chat system that allowed users to communicate via text while their avatars automatically animated attention, salutations, turn taking, back-channel feedback and facial expression, as well as simple body functions such as the blinking of the eyes.

MIT 1997: ShadowTalk
With Joey Chang

This was a part of the Literary Salon project, where the goal was to explore how technology could encourage people, sharing a physical place, to meet each other through verbal play or other conversational activities. ShadowTalk provided spontaneous performance between tables in a cafe by allowing guests to send a "shadow" across the walls to other tables to initiate a conversation with other people's shadows. Messages typed at a table were delivered by an animated shadow in a natural fashion, using rules of social interaction to generate appropriate non-verbal cues.

MIT 1996: Gandalf (Kris Thorisson's Ph.D. Thesis)
With Kris Thorisson

Gandalf, one of the very first embodied conversational agents (ECA) brought to life, was the creation of Dr. Kris Thorisson. Gandalf was an expert on the solar system, and by interacting with people through speech and gesture, could answer their questions face-to-face and even take them on a visual tour by commanding a wall-sized graphical projection. Gandalf was based on a computational model of psychosocial dialogue expertise, that bridged perceptual analysis of multimodal events and multimodal action generation. As the first ECA project I worked on, I got the honor of animating Gandalf's hand.