Thórisson home    

 

 

Independent Study Courses

 
 
taught by Kristinn R. Thórisson @ RU
 
 
(inquire with instructor before registering)
 
 
TAUGHT
COURSE NAME
 

 

Fall 2006
T-615/715-INDS DISTRIBUTED ARCHITECTURES (B.Sc. / M.Sc.)
T-615/715-INDS GARAGE A.I.: ROBOT TECHNOLOGIES (B.Sc. / M.Sc.)
T-615/715-INDS A.I. EXPERIMENTATION PLATFORM (B.Sc. / M.Sc.)
T-615/715-INDS COMPUTER PERCEPTION: FLEXIBLE DIALOG SYSTEMS (B.Sc. / M.Sc.)
 
     
 
Spring 2006
T-615-INDS MINDMONITOR FOR VIRTUAL ROBOT  
     
 
Fall 2005
T-715-INDS COMPUTER PERCEPTION: SPEECH RECOGNITION (M.Sc.)
T-715-INDS NATURAL REALTIME SPEECH SYNTHESIS (M.Sc.)
 
     
 
Spring 2005
T-615-INDS COMPUTER PERCEPTION: PROSODY
T-615-INDS WI-FI ROBOT PLATFORM
 
     
     
COURSE DESCRIPTIONS
 
 

 

T-615-INDS WI-FI ROBOT PLATFORM

As the price of computers and electronics keeps falling, futuristic applications of technology are increasingly becoming part of everyday life. One such technology is robotic helpers that understand spoken dialog and assist people in their daily tasks. However, the success of robots in the workplace and at home depends on the successful development of artificial intelligence (A.I.) software. Unfortunately, people who understand A.I. and want to do robotics have to spend a lot of their time on hardware, or have to buy expensive robotics platforms. This hampers the development of A.I. software, limits the number of people who can work on A.I. and robotics and slows down the progress of the whole field. To remedy this we are preparing a project called "Garage A.I." which aims to train future generations of artificial intelligence developers. An important part of this project is the development of an inexpensive yet versatile robot that can connect to a network via Wi-Fi, transfer audio and video data, and respond to human communication. The robot will run Linux and will be made mostly from spare parts (hence the term "Garage A.I."). We are looking for talented and motivated students to develop this robot platform and help us prepare its use in a classroom setting. The result will be a fun, interactive robot that can help young adults get into artificial intelligence research in a cost-effective and quick way. Project counts as a standard 12-week, three-unit course, and has comparable workload.

Hours are flexible. Grade is based on quality of work. There will be no exams.

Advisor is Dr. Kristinn R. Thórisson

Prerequisites: C/C++, Java, Linux, hardware drivers, DA/AD interfaces Must have prior experience with electronics, motors and powersystems, or feel confident that they can solve such problems in the course of one semester.

Because of limited seats, admissions to the course are subject to advisor approval.

 
   
 

 

T-615-INDS COMPUTER PERCEPTION: PROSODY

Speech recognition technology has come a long way in the roughly thirty years since its beginning. Automatic recognition now works for thousands of words and is speaker-indepent. However, several features of communication are ignored by most recognizers. One of these is prosody -- *how* we say it (as opposed to *what* we say). This course aims to develop a prosody analyzer that can track some features of the *how*, in real time, such as intonation (pitch) and tempo (rhythmic features). The result will be a software module that can augment off-the-shelf speech recognizers, such as for example Sphinx-4 from CMU, by providing more clues to a speaker's vocal behavior, laying the foundation for more robust and meaningful analysis of speech and communicative behavior. The module will be hooked up to the "ears" of a virtual robot that is planned to be developed this year in the newly established A.I. Lab at R.U.
Project counts as a standard 12-week, three-unit course, and has comparable workload.
Hours are flexible. Grade is based on quality of work. There will be no exams.
Advisor is Dr. Kristinn R. Thórisson
This is a three-unit course.
Prerequisits: C/C++, Java, Linux, digital audio/signal processing
Experience with any of these is a plus: Linux audio development, digital and/or analog filters, speech recognition, audio compression, time-series neural nets, Csound, cross-platform audio development (Linux, Mac OS X, Windows)

 

 
   
 

 

T-615-DIST DISTRIBUTED ARCHITECTURES

This independent project provides students a first-hand experience with two alternative approaches to networked execution of multiple processes. The approaches are CORBA, a well-established protocol developed by a community of researchers in the U.S. and Europe, and OpenAIR, a more recent addition to the problem, developed by members of the Computer Science Department at Reykjavik University, in collaboration with Edinburgh University and NYU. A comparison of the two approaches will highlight some of the theoretical and practical issues of cluster computing and give students a strong background for further study in the area.

This is a three-unit project which counts as a standard 12-week, three-unit course, and has comparable workload.

Hours are flexible. Grade is based on quality of work. There will be no exams.

Prerequisites: C/C++, Java, Linux

Experience with any of these is a plus: Network protocols, distributed computing, CORBA, Open Agent Architecture, Java RMI

Advisors are Dr. Kristinn R. Thórisson and Dr. Björn Thór Jónsson

FURTHER INFORMATION
With radically falling prices in computing power over the last 3 decades the primary determining factor of rate of progress in computer science and technology is software development. By automating low-level functionality, such as memory management and graphics updates, modern programming languages are making the development of systems at a higher level more feasible. There is, however, an untapped opportunity in cluster computing, where multiple machines are handed decomposed tasks whose parts are to be executed in parallel.

In the quest to enable easier usage of multiple computers various methods and mechanisms have been invented over the last decade. Typically, different goals of the systems have lead development down different solution paths. The two main approaches that have been offered are the object-oriented approach and the message-oriented approach.

Among the former, Java RMI serves the purpose of letting isolated program images call methods within each other. CORBA [1] is a relatively general technology that allows transparent communication between programs running on multiple computers that are written in different languages. CORBA takes the object-oriented approach: An object makes a request for a service or for information, and this request is brokered by a central server, simulating an extended function call. This general mechanism works well for systems that can assume a larger temporal granularity than the network can provide. In real-time systems, however, this assumption is both simplistic and insufficient. An extension to CORBA, Real-time CORBA, is meant to address this shortcoming in the original design. However, because CORBA and other object-oriented approaches (e.g. DCOM [2][2]) try to make the whole system behave like one big computer program, it becomes cumbersome to deploy and debug, in many cases, and even impossible to deploy in some other cases, especially in systems where real-time performance is paramount.

The alternative to the object-oriented approach is message-based routing. Narada [3], for example, is a system which has solved numerous problems with regards to message-based routing, including communication through firewalls. However, Narada has only been implemented in Java, and a practical problem with Java is that most real-time applications eventually require native C/C++ code. So instead of using pure Java, the system loses a lot of its platform independence while at the same time possibly running more slowly than a clean native implementation in C/C++ would. A system as big as Narada [3] is also in many ways unwieldy; in their goal of solving a huge set of design problems their footprint becomes prohibitively large for a number of uses. For example, Narada depends not only on Xerces [4] and Xalan [5], but also about 10 other external libraries. This makes it difficult to port the system to other programming languages, and to deploy it on platforms with restricted memory sizes. A related problem, which it shares with CORBA, is that it is not simple to set up or use.

A relative newcomer to the group of message-passing solutions is OpenAIR [6]. OpenAIR is a routing and communication protocol based on a publish/subscribe architecture. It is intended to be the "glue" that allows numerous A.I. researchers to share code more effectively – “AIR to share”. Serving essentially as the “post office and mail delivery system” for distributed, multi-module systems, OpenAIR provides the foundation upon which subsequent markup languages and semantics can be based, e.g. gesture recognition and generation, computer vision, hardware-software interfacing, etc.

This project will focus on comparing these two main approaches, remote procedure calls and messaging, by analyzing two different solutions: CORBA and OpenAIR. The students will evaluate the approaches along many dimensions including ease of use, breadth of support and speed of execution, and compare the design limitations that each system presents in its approach. As part of the project simple systems will be built with each approach, and, where needed, adapters will be written, e.g. in Python, C# or other appropriate language for which no adapters exist yet.

The primary deliverables of the project are 1) a research report comparing the software architectures along the lines outlined above and 2) software implementation for OpenAIR.References

[1]  http://www.omg.org/technology/documents/formal/corba_iiop.htm
[2]   Microsoft Corporation, "Distributed Component Object Model Protocol - DCOM/1.0," January 1998. http://www.microsoft.com/com/wpaper/default.asp#DCOMpapers.
[3]  http://www.naradabrokering.org/
[4]  http://xml.apache.org/xerces2-j/
[5]   http://xml.apache.org/xalan-j/
[6]  http://www.mindmakers.org/

 

 
   
 

 

T-715-INDS Natural Realtime Speech Synthesis
T-715-INDS Náttúrulegt tölvutal í rauntíma

Teacher: Kristinn R. Thórisson
Units: 3
Description
This project involves the study and development of a real-time speech synthesis syste for use in dynamic dialogue systems. The goal is to create a speech synthesizer that sounds as natural as possible while being able to synthesize speech that is generated incrementally. The student will test and evaluate 3-5 different speech synthesizers and select one for further development. The synthesizer will be outfitted (and expanded, if necessary) with controls that make it possible to have it produce partial sentences with correct intonation, where the intonation is specified for each partial sentence by a separate intonation control unit. The quality of the synthesis will be evaluated by feeding it with 20 test sentences that have been split in two and three parts in various places and feeding each part in sequence to the speech synthesizer. This is intended to simulate the real-time, incremental generation of sentences that is observed in human speech. The smoothness of this two-part synthesis, and the naturalness of the intonation, will determine the perceived quality of the synthesis. Additionally, experiments will be made with putting breathing sounds in the right places in sentence generation, to make it sound more natural. The student may choose to collaborate with faculty and students at Carnegie Mellon University, U.S., or researchers at British Telecom (U.K.) as much as the project calls for. The final result of this project will be a speech synthesizer that can take partial sentences with intonation specification and synthesize them while splicing them together in the right places, as seamlessly as possible.

Evaluation
The teacher will give the student a grade based on her accomplishments in the course, as measured primarily by how well she manages to stick to the work plan and meet the defined milestones in this plan, which will be created in a collaboration between the teacher and the student at the beginning of the term. (All major deviations from this original plan must be supported by valid argumentation.) The final grade will be supplemented by a one-paragraph explanation written by the teacher.

 

 
   
 

 

T-715-INDS COMPUTER PERCEPTION: SPEECH RECOGNITION
T-715-INDS Sjálfvirk talskynjun

Teacher: Kristinn R. Thórisson
units: 3
Description
The project involves the study and development of speech recognition for multimodal context. More specifically, the project is composed of three main phases. First, selection of two or more speech recognizers for recognizing both full sentences and keywords, identification of their features and plan for how to use and/or expand these. Second, the integration of the selected recognizers in a uniform framework, using the Psyclone software, and writing control mechanisms for achieving initial flexibility in their behavior at runtime. Third, the development of control strategies for making the recognition work in an interactive dialog system with speech synthesis and prosody analysis. The project also includes writing tool for visualizing the activity of the system in realtime, as well as a literature review on recent relevant research.

Evaluation
The students will be given a grade based on their accomplishments in the course, as measured primarily by how well they manage to stick to the statement of intended work, which will be created in a collaboration between the teacher and the students. All major deviations from this original plan must be supported by valid argumentation. The final grade will be supplemented by a one-paragraph explanation.

Readings

Bryson, J. 2000b. Making modularity work: Combining memory systems and intelligent processes in a dialog agent. In Sloman, A., ed., AISB'00 Symposium on Designing a Functioning Mind, 21-30.
Cahn, J., & Brennan, S (1999). A psychological model of grounding and repair in dialog. Proceedings of the 1999 AAAI Fall Symposium: Psychological Models of Communication in Collaborative Systems, Cape Cod, MA, 25-33.
Paek, T. & Horvitz, E. 1999. Uncertainty, utility, and misunderstanding. AAAI Fall Symposium on Psychological Models of Communication, North Falmouth, MA, November 5-7, 85-92.
Litman, Diane J. 1996. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research, 5:53-94.
B. De Carolis., C. Pelachaud and I. Poggi. 2000. Verbal and nonverbal discourse planning. Proceedings of International Agents 2000 Workshop on Achieving Human-Like Behavior in Interactive Animated Agents. Barcelona, 2000.

 

 
   
 

T-615/715-INDS COMPUTER PERCEPTION: FLEXIBLE DIALOG SYSTEMS (B.Sc. / M.Sc.)

The course involves the study of speech recognition and prosody analysis technology to enable computers to have a more natural and flexible dialog than currently possible. The task assigned in the course is to integrate state-of-the-art speech recognition with state-of-the-art prosody analysis, to achieve realtime dialog with the computer. The course will give the student an experience with the issues involved in realtime dialog and hands-on experience with current state-of-the-art speech systems.

Prerequisites: C/C++, Java, Linux.
It is preferable that the student have some programming experience with speech recognition systems e.g. Sphinx.

Evaluation
The student(s) will be given a grade based on accomplishments in the course, as measured primarily by how well they manage to stick to the statement of intended work, which will be created in a collaboration between the teacher and the students. All major deviations from this original plan must be supported by valid argumentation. The final grade will be supplemented by a one-paragraph explanation.

Hours are flexible. Grade is based on quality of work. There will be no exams.

 

 
   
 

 

T-615/715-INDS A.I. EXPERIMENTATION PLATFORM (B.Sc. / M.Sc.)

While A.I. technology will become increasingly important in the coming decade, people who are trained in the technology are scarce. This presents both an opportunity and a challenge. The newly established A.I. lab at R.U., Center for Analysis & Design of Intelligent Agents (CADIA), is addressing both by using technologies developed in the Garage A.I. movement to enable novices and experts alike to do more A.I. in less time than before possible.

The independent study will focus on a project; the project will be founded on a set of existing technologies, such as speech recognition, speech synthesis, facial animation, virtual environment, knowledge bases or middleware. The student(s) will combine existing software as well as develop new.

The result will be one more "brick" in a growing set of software packages that can be hooked together "like LEGO bricks", and instructions that take students through a self-directed tour of artificial intelligence concepts, tutorials and experiments.

In the course of study the students will work closely with the advisors to specify and develop the software. It is an opportunity for students to get a working experience with a range of A.I. technologies and an overview of some of the most advanced concepts in the field.

Advisors are Dr. Kristinn R. Thórisson and Yngvi Björnsson

This is a three-unit Independent Study course.

Prerequisites: C/C++, Java, Linux

Experience with these is a plus: Computer graphics, speech recognition/synthesis, knowledge representation, virtual environments, facial animation, and Introduction to A.I. or equivalent.

Because of limited seats, admissions to the course are subject to advisor approval.

Hours are flexible. Grade is based on quality of work. There will be no exams.

Evaluation
The student(s) will be given a grade based on accomplishments in the course, as measured primarily by how well they manage to stick to the statement of intended work, which will be created in a collaboration between the teacher and the students. All major deviations from this original plan must be supported by valid argumentation. The final grade will be supplemented by a one-paragraph explanation.

 
   
 

 

T-615-INDS MINDMONITOR FOR VIRTUAL ROBOT

Vitverur sem hafa samskipti við fólk er mjög nýlegt rannsóknarsvið og þróun á verkfærum til slíkra kerfa er mjög skammt á veg komin. Líklegt má telja að þessi hugsjá sé eitt af fyrstu slíkum verkfærum sem framleidd eru. Markmiðslýsing

Superhumanoid 1 (S1) er rúmlega 20 milljóna verkefni í þróun hjá Gervigreindarsetri HR. Það felst meðal annars í því að gefa sýndarveru hæfileika til að segja sögur og tala við fólk, bæði á íslensku á ensku. Kerfið að baki S1 er fjöldi “greindra” eininga sem vinna úr vissum hlutum tals og taka ákvarðanir um hegðun vitverunnar. Mitt verkefni felst í því að búa til birtingarkerfi sem mun gera rannsakendum setursins kleift að þróa kerfið mun hraðar. Jafnframt mun kerfið auðvelda almennar kynningar á þróunarverkefninu þar sem það mun gera okkur kleift að “skyggnast inn í huga” vitverunnar. Hugsjáin mun einnig nýtast í fleiri rannsóknarverkefnum Gervigreindarsetursins, t.d. í Skundari (barngott véldýr sem hefur samskipti við fólk)Þáttur námsmanns/námsmanna

Nemandi mun smíða birtingarkerfi fyrir S1. Afrakstur verkefnisins er “hugsjá” sem nýtist bæði rannsakendum gervigreindarsetursins og auðveldar jafnframt kynningu á gervigreind fyrir hugsanlegum nemendum í tölvunarfræði og áhugasömum almenningi.

Evaluation
The student(s) will be given a grade based on accomplishments in the course, as measured primarily by how well they manage to stick to the statement of intended work, which will be created in a collaboration between the teacher and the students. All major deviations from this original plan must be supported by valid argumentation. The final grade will be supplemented by a one-paragraph explanation.

 

 
   
 

 

T-615/715-INDS GARAGE A.I.: ROBOT TECHNOLOGIES (B.Sc. / M.Sc.)

The Garage A.I. movement aims to train future generations of artificial intelligence developers. Work in Garage A.I. has resulted in various technologies that can be used for novice and expert users to build A.I. systems faster than before possible. This independet study involves the development of technologies relevant to a mobile Wi-Fi robot whose brain runs on up to 24 networked computers. The robot is equipped with a depth camera, a color camera and a directional microphone. For output it has a speaker for sound, movement in two dimenisions and a movable head.

In the course of study the students will work closely with the advisors to specify and develop the software. It is an opportunity for students to get a working experience with a range of A.I. technologies and an overview of some of the most advanced concepts in the field.

Project counts as a standard 12-week, three-unit course, and has comparable workload.

Hours are flexible. Grade is based on quality of work. There will be no exams.

Advisor is Dr. Kristinn R. Thórisson
Prerequisites depend on the particular project description, which can be tailored to the individual(s). Groups of 2 or 3 students are highly recommended.

Because of limited seats, admissions to the course are subject to advisor approval.