Tuesday, September 20, 2016
8:309:00Continental Breakfast
9:0017:30Workshops and Tutorials
Wednesday, September 21, 2016
8:309:00Continental Breakfast
9:009:30Introductory Remarks
Dr. James Blascovich, UC Santa Barbara
10:3011:00Coffee Break
Jan Kolkmeier, Jered Vroon, and Dirk Heylen
The Equilibrium Theory put forward by Argyle and Dean, posits that in human-human interactions, gaze and proxemic behaviors work together in establishing and maintaining a particular level of intimacy. This theory has been evaluated and used in Virtual Reality settings where people interact with Virtual Humans. In this study we disentangle the single and joint effects of proxemic and gaze behavior in this setting further, and examine how these behaviors affect the perceived personality of the agents. We simulate a social encounter with Virtual Humans in immersive Virtual Reality. Gaze and proxemic behaviors of the agents are manipulated dynamically while the participants' gaze and proxemic responses are being measured. As could be expected, participants showed strongest gaze and proxemic responses when agents manipulated both at the same time. However, agents that only manipulated gaze elicited weaker responses compared to agents that only manipulated proxemics. Agents that exhibited more directed gaze and reduced interpersonal distance were attributed higher scores on intimacy related items than agents that exhibited averted gaze and increased interpersonal distance.
Carolin Strassmann, Astrid Rosenthal von der Pütten, Ramin Yaghoubzadeh, Raffael Kaminski, and Nicole Krämer
In order to design a successful human-agent-interaction, knowledge about the effects of a virtual agent´s behavior is important. Therefore, the presented study aims to investigate the effect of different nonverbal behavior on the agent's person perception with a focus on dominance and cooperativity. An online study with 190 participants was conducted to evaluate the effect of different nonverbal behaviors. 23 nonverbal behaviors of four different experimental conditions (dominant, submissive, cooperative and non-cooperative behavior) were compared. Results emphasize that, indeed, nonverbal behavior is powerful to affect users´ person perception. Data analyses reveal symbolic gestures such as crossing the arms, stemming the hands on the hip or touching one's neck to most effectively influence dominance perception. Regarding perceived cooperativity expressivity has the most pronounced effect.
12:0015:30Box Lunch and Optional Tour of Sony Pictures Studios
15:3016:00Coffee Break
Elena Corina Grigore, Andre Pereira, Ian Zhou, David Wang, and Brian Scassellati
The ability of social agents, be it virtually-embodied avatars or physically-embodied robots, to display social behavior and interact with their users in a natural way represents an important factor in how effective such agents are during interactions. In particular, endowing the agent with effective communicative abilities, well-suited for the target application or task, can make a significant difference in how users perceive the agent, especially when the agent needs to interact in complex social environments. In this work, we consider how two core input communication modalities present in human-robot interaction — speech recognition and touch-based selection — shape users' perceptions of the agent. We design a short interaction in order to gauge adolescents' reaction to the input communication modality employed by a robot intended as a long-term companion for motivating them to engage in daily physical activity. A study with n = 52 participants shows that adolescents perceive the robot as more of a friend and more socially present in the speech recognition condition than in the touch-based selection one. Our results highlight the advantages of using speech recognition as an input communication modality even when this represents the less robust choice, and the importance of investigating how to best do so.
Natasha Jaques, Daniel McDuff, Yoo Lim Kim, and Rosalind Picard
This paper investigates how an intelligent agent could be designed to both predict whether it is bonding with its user, and convey appropriate facial expression and body language responses to foster bonding. Video and Kinect recordings are collected from a series of naturalistic conversations, and a reliable measure of bonding is adapted and verified. We then train a deep neural network classifier using one minute segments of facial expression and body language data, and show that it is able to accurately predict bonding in novel conversations. A qualitative and quantitative analysis is conducted to determine the non-verbal cues that characterize both high and low bonding conversations.
Philipp Kulms and Stefan Kopp
Success in human--agent interactions, especially when they entail long-lasting interactions, depends largely on the ability of the system to cooperate with humans over repeated tasks. However, it is not yet clear how interacting with autonomous agents is interlinked with the attribution of social qualities like trust. Since cooperation with social agents not necessarily requires trust if coordination is stable, it is all the more important to predict when it should build on trust or trustworthiness. To help explain this, we report findings from a human--agent experiment designed to measure trust in task-related suggestions by non-embodied versus embodied human-like agents. Our results show how only at the beginning of the interaction, trust in a human-like agent's suggestion was higher. Over time, the most consistent source of trust was the quality of suggestions. Embodiment also led to more requested suggestions. This effect lasted longer than trust in the suggestions.
Andry Chowanda, Martin Flintham, Peter Blanchfield, and Michel Valstar
This paper presents the findings of an empirical study that explores player game experience by implementing the ERiSA Framework in games. A study with Action Role-Playing Game (RPG) was designed to evaluate player interactions with game companions, who were imbued with social and emotional skill by the ERiSA Framework. Players had to complete a quest in the Skyrim game, in which players had to use social and emotional skills to obtain a sword. The results clearly show that game companions who are capable of perceiving and exhibit emotions, are perceived to have personality and can forge relationships with the players, enhancing the player experience during the game.
Lazlo Ring, Dina Utami, Stefan Olafsson, and Timothy Bickmore
We describe a series of algorithms which automatically control camera position in a virtual environment while a user is engaged in a simulated face-to-face dialog with a single virtual agent. The common objective of the algorithms is to increase user engagement with the interaction. In our work, we describe three different automated camera control systems that: (1) control the cameras position based on topic changes in dialogue; (2) use sentiment analysis to control the camera-to-agent distance; and (3) adjust the camera's depth-offield based on “important” parts within the dialogue. Evaluation studies of each method are described. We find that changing camera position based on topic shifts results in significant increases in a self-reported measure of engagement, while the other methods seem to actually decrease user engagement. Interpretations and ramifications of the results are discussed.
17:3019:00Welcome Reception
Poster & Demo (Early Session)
19:0020:30Poster & Demo (Late Session)
Thursday, September 22, 2016
8:309:00Continental Breakfast
Frank Kaptein, Joost Broekens, Koen Hindriks, and Mark Neerincx
Cognitive agent programming frameworks facilitate the development of intelligent virtual agents. By adding a computational model of emotion to such a framework, one can program agents capable of using and reasoning over emotions. Computational models of emotion are generally based on cognitive appraisal theory; however, these theories introduce a large set of appraisal processes, which are not specified in enough detail for unambiguous implementation in cognitive agent programming frameworks. We present CAAF (Cognitive Affective Agent programming Framework), a framework based on the belief-desire theory of emotions (BDTE), that enables the computation of emotions for cognitive agents (i.e., making them cognitive affective agents). In this paper we bridge the remaining gap between BDTE and cognitive agent programming frameworks. We conclude that CAAF models consistent, domain independent emotions for cognitive agent programming.
Jill Fain Lehman, Nikolas Wolfe, and Andre Pereira
Existing speech technology tends to be poorly suited for young children at play, both because of their age-specific pronunciation and because they tend to play together, making overlapping speech and side discussions about the play itself ubiquitous. We report the performance of an autonomous, multi-keyword spotter that has been trained and tested on data from a multiplayer game designed to focus on these issues. In Mole Madness, children laugh, yell, speak at the same time, make side comments and even invent their own forms of the basic keywords in order to control a virtual on-screen character. Within this challenging language environment, the system is able to achieve 94% overall recall and 85% overall accuracy, providing child-child and child-robot pairs with responsive play in a rapid-paced game. This technology can enable others to create novel multiparty interactions for entertainment where a limited number of keywords has to be recognized.
Joanna Taoum, Bilal Nakhal, Elisabetta Bevacqua, and Ronan Querrec
This paper introduces a new research work that aims to improve embodied conversational agents with tutor behavior by endowing them with the capability to generate feedback in pedagogical interactions with learners. The virtual agent feedback and the interpretation of the user's feedback are based on the knowledge of the environment (informed virtual environment), the interaction and the pedagogical strategies structured around classical intelligent tutoring system models. We present our first steps to implement our proposed architecture based on a model of informed virtual environment. We also describe the ideas that will guide the design of the Tutor Behavior. The planned evaluation method and a first application are also presented.
10:0010:30Coffee Break
10:3011:00Workshop Reports
Setareh Gilani, Kraig Sheetz, Gale M. Lucas, and David Traum
Telling stories is an important aspect virtual agents designed to interact with people socially over time. We describe an experiment designed to investigate the impact of the identity, presentation form, and perspective of a virtual storyteller on a human user who engages in a story-swapping activity with two virtual characters. For each interaction, the user was given 10 ``ice-breaker'' questions to ask a virtual character and respond to the character's reciprocal request. Participants also filled out a post-interaction survey, measuring rapport with the character and impressions of the character's personality. Results generally show that participants prefer characters who tell first person stories, however there were some interactions with presentation order. No significant preferences were established for the form or identity variables.
Zev Battad and Mei Si
Storytelling has always been an effective and intuitive method of exchanging information. In today’s world of large, open, structured data, storytelling can benefit the ways in which people explore and consume such information. In this work, we investigate this potential. In particular, methods for creating multiple interweaving storylines are explored for tying together possibly disparate veins of exploration in such large networks of information and helping maintain audience interest. This paper presents the algorithms for automatically generating interweaving storylines, followed by examples and discussions for future work.
Timothy Bickmore, Ha Trinh, Michael Hoppmann, and Reza Asadi
The design of a conversational virtual agent that assists professors and students in giving in-class oral presentations is described, along with preliminary evaluation results. The life-sized agent is integrated with PowerPoint presentation software and can deliver scripted presentations in conjunction with one or more human presenters using appropriate verbal and nonverbal behavior. Results from evaluation studies in two very different courses—business and professional speaking, and computer science research methods—indicate that the agent is widely accepted in the classroom by students, and can serve to increase engagement in presentations given both by professors and students. An agenda of future work is also presented.
Mathieu Chollet, Nithin Chandrashekhar, Ari Shapiro, Louis-Philippe Morency, and Stefan Scherer
Virtual audiences are used for training public speaking and mitigating anxiety related to it. However, research has been scarce on studying how virtual audiences are perceived and which non-verbal behaviors should be used to make such an audience appear in particular states, such as boredom or engagement. Recently, crowdsourcing methods have been proposed for collecting data for building virtual agents' behavior models. In this paper, we use crowdsourcing for creating and evaluating a nonverbal behaviors generation model for virtual audiences. We show that our model successfully expresses relevant audience states (i.e. low to high arousal, negative to positive valence), and that the overall impression exhibited by the virtual audience can be controlled my manipulating the amount of individual audience members that display a congruent state.
Thomas Janssoone, Chloé Clavel, Kévin Bailly, and Gaël Richard
In the field of Embodied Conversational Agent (ECA), one of the main challenges is to generate socially believable agents. The long run objective of the present study is to infer rules for the multimodal generation of agents' socio-emotional behaviour. In this paper, we propose to learn the rules from the analysis of a multimodal corpus composed by audio-video recordings of human-human interactions. The proposed methodology consists in applying a Sequence Mining algorithm using automatically extracted Social Signals such as prosody and facial muscles activation as an input. This allows us to infer Temporal Association Rules for the behaviour generation. We show that this method can automatically compute Temporal Association Rules coherent with prior results found in the literature, especially in the psychology and sociology fields. The results of a perceptive evaluation confirms the ability of a Temporal Association Rules based agent to express a specific stance.
Blaise Potard, Matthew Aylett, and David Braude
Emotional expression is a key requirement for intelligent virtual agents. In order for an agent to produce dynamic spoken content speech synthesis is required. However, despite substantial work with pre-recorded prompts, very little work has explored the combined effect of high quality emotional speech synthesis and facial expression. In this paper we offer a baseline evaluation of the naturalness and emotional range available by combining the freely available SmartBody component of the Virtual Human Toolkit (VHTK) with CereVoice text to speech (TTS) system. Results echo previous work using pre-recorded prompts, the visual modality is dominant and the modalities do not interact. This allows the speech synthesis to add gradual changes to the perceived emotion both in terms of valence and activation. The naturalness reported is good, 3.54 on a 5 point MOS scale.
Kathrin Haag and Hiroshi Shimodaira
Previous work in speech-driven head motion synthesis is centered around Hidden Markov Model (HMM) based methods and data that does not show a large variability of expressiveness in both speech and motion. When using expressive data, these systems often fail to produce satisfactory results. Recent studies have shown that using deep neural networks (DNNs) results in a better synthesis of head motion, in particular when employing bidirectional long short-term memory (BLSTM). We present a novel approach which makes use of DNNs with stacked bottleneck features combined with a BLSTM architecture to model context and expressive variability. Our work is based on The University of Edinburgh Speaker Personality and MoCap Dataset which contains spontaneous and expressive dialogues. Our proposed DNN architecture outperforms conventional feed-forward DNNs and simple BLSTM networks in an objective evaluation. Results from a subjective evaluation show a significant improvement of the bottleneck architecture over feed-forward DNNs.
Cliceres Mack Dal Bianco, Adriana Braun, Soraia Musse, Claudio Jung, and Norman Badler
The processing time to simulate crowds for games or simulations is a real challenge. While the increasing power of processing capacity is a reality in the hardware industry, it also means that more agents, better rendering and most sophisticated Artificial Intelligence (AI) methods can be used, so again the computational time is an issue. Despite the processing cost, in many cases the most interesting period of time in a game or simulation is far from the beginning or in a specific known period, but it is still necessary to simulate the whole time (spending time and processing capacity) to achieve the desired period of time. It would be useful to fast forward the time in order to see a specific period of time where simulation result could be more meaningful for analysis. This paper presents a method to provide time travel in Crowd Simulation. Based on crowd features, we compute the expected variation in velocities and apply that for time travel in crowd simulation.
15:1515:45Coffee Break
Ran Zhao, Tanmay Sinha, Alan W. Black, and Justine Cassell
This work focuses on data-driven discovery of the temporally cooccurring and contingent behavioral patterns that signal high and low interpersonal rapport. We mined a reciprocal peer tutoring corpus reliably annotated for nonverbals like eye gaze and smiles, conversational strategies like self-disclosure and social norm violation, and for rapport (in 30 second thin slices). We then performed a fine-grained investigation of how the temporal profiles of sequences of interlocutor behaviors predict increases and decreases of rapport, and how this rapport management manifests differently in friends and strangers. We validated the discovered behavioral patterns by predicting rapport against our ground truth via a forecasting model involving two-step fusion of learned temporal associated rules. Our framework performs significantly better than a baseline linear regression method that does not encode temporal information among behavioral features.Implications for the understanding of human behavior and social agent design are discussed.
Nesrine Fourati, Adeline Richard, Sylvain Caillou, Nicolas Sabouret, Jean-Claude Martin, Emilie Chanoni, and Celine Clavel
In this paper, we present a framework for an expressive virtual storyteller for children. Our virtual storyteller displays facial expressions of appraisals related to story events. The facial expressions are animated jointly with deictic gestures towards graphical elements representing story events. We describe a preliminary study that we conducted with 23 children.We discuss the impact of facial expressions of appraisals on children's memorization of story events, their perception of characters' appraisals, their subjective perception of the virtual storyteller and more generally how emotion combines with joint attention.
Florian Pecune, Angelo Cafaro, Magalie Ochs, and Catherine Pelachaud
In this paper we evaluate a model of social decision-making for virtual agents. The model computes the social attitude of a virtual agent given its social role during the interaction and its social relation toward the interactant. The resulting attitude influences the agent's social goals and therefore determines the decisions made by the agent in terms of actions and communicative intentions to accomplish. We conducted an empirical study in the context of virtual tutor-child interaction were participants evaluated the tutor's perceived social attitude towards the child while the tutor's social role and relation were manipulated by our model. Results showed that both role and social relation have an influence on the agent's perceived social attitude.
Ameneh Shamekhi, Mary Czerwinski, Gloria Mark, Margeigh Novotny, and Gregory A. Bennett
Designing virtual personal assistants that are able to engage users in an interaction have been a challenge for HCI researchers for the past 20 years. In this work we investigated how a set of vocal characteristics known as “conversational style” (CS) could play role in engaging users in an interaction with a virtual agent. We also examined whether the similarity attraction principle influences how people orient towards agents with different CSs. A within subject experiment was conducted to explore the best vocal characteristics for a virtual agent based on the users' own style in order to optimize users' engagement and perceived quality of the interaction. Results of this experiment on 102 subjects revealed that users exhibited similarity attraction toward computer agents, and prefer the agent whose conversational style matches their own. The study results contribute to our understanding of how the design of intelligent agents' conversational style influences users' engagement and perceptions of the agent, compared to known human-to-human interaction.
17:0017:30Video Gala & Voting
18:3019:00Transfer to Banquet
19:0021:30Banquet at the Jamaica Bay Inn's "Beachside Restaurant"
Friday, September 23, 2016
8:309:00Continental Breakfast
Mark Walsh, Pixar Animation Studios & Motional Entertainment
10:0010:30Coffee Break
Jonathan Gratch, David DeVault, and Gale M. Lucas
This article examines the potential for teaching negotiation skills with virtual humans. Many people find negotiations to be aversive. We conjecture that people may be more comfortable practicing negotiation skills with an agent than with another person. We test this conjecture using the Conflict Resolution Agent, a semi-automated virtual human that negotiates with people via natural language. In a between-participants design, we independently manipulated two pedagogically-relevant factors while participants engaged in repeated negotiations with the agent: perceived agency (participants either believed they were negotiating with a computer program or another person) and pedagogical feedback (participants received instructional advice or no advice between negotiations). Findings show that people found it less aversive to practice with a computer program and negotiated more forcefully following instructional feedback. These findings lend support to the notion of using virtual humans to teach interpersonal skills.
Sébastien Lallé, Nicholas Mudrick, Michelle Taub, Joseph Grafsgaard, Cristina Conati, and Roger Azevedo
Students' emotions are known to influence learning and motivation while working with agent-based learning environments (ABLEs). However, there is limited understanding of how Pedagogical Agents (PAs) impact different students' emotions, what those emotions are, and whether this is modulated by students' individual differences (e.g., personality, goal orientation). Such understanding could be used to devise intelligent PAs that can recognize and adapt to students' relevant individual differences in order to enhance their experience with learning environments. In this paper, we investigate the rela-tionship between individual differences and students' affective reactions to four intelligent PAs available in MetaTutor, a hypermedia-based intelligent tutoring system. We show that achievement goals and personality traits can significantly modulate students' affective reactions to the PAs. These findings suggest that students may benefit from personalized PAs that adapt to their motivational goals and personality.
Astrid Rosenthal-von der Pütten, Carolin Strassmann, and Nicole Krämer
When designing an artificial tutor, the question arises: should we opt for a virtual or a physical embodied conversational agent? With this work we contribute to the ongoing debate of whether, when and how virtual agents or robots provide more benefits to the user and conducted an experimental study on linguistic alignment processes in HCI in the context of second language acquisition. In our study (n=130 non-native speakers) we explored the influence of design characteristics and investigated the influence of embodiment (virtual agent vs. robot vs. speech based interaction) and system voice (text-to-speech vs. pre-recorded speech) on participants' perception of the system, their motivation, their lexical and syntactical alignment during interaction and their learning effect after the interaction. The variation of system characteristics had no influence on the evaluation of the system or participants' alignment behavior.
Ulysses Bernardet, Mathieu Chollet, Steve DiPaola, and Stefan Scherer
In this paper, we present a reflexive behavior architecture, that is geared towards the application in the control of the non-verbal behavior of the virtual humans in a public speaking training system. The model is organized along the distinction between behavior triggers that are internal (endogenous) to the agent, and those that origin in the environment (exogenous). The endogenous subsystem controls gaze behavior, triggers self-adaptors, and shifts between different postures, while the exogenous system controls the reaction towards auditory stimuli with different temporal and valence characteristics. We evaluate the different components empirically by letting participants compare the output of the proposed system to valid alternative variations.
Ha Trinh, Darren Edge, Lazlo Ring, and Timothy Bickmore
Oral presentations are central to scientific communication, yet the quality of many scientific presentations is poor. To improve presentation quality, scientists need to invest greater effort in the creative design of presentation content. We present AceTalk, a presentation planning system supported by a virtual assistant. This assistant motivates and collaborates with users in a structured brainstorming process to explore engaging presentation structures and content types. Our study of AceTalk demonstrates the potential of human-agent collaboration to facilitate the design of audience-centered presentations, while highlighting the need for rich modelling of audiences, presenters and talk contexts.
Farina Freigang and Stefan Kopp
In natural communication, humans enrich their utterances with pragmatic information indicating, e.g., what is important to them or what they are not certain about. We investigate whether and how virtual humans (VH) can employ this kind of meta-communication. In an empirical study we have identified three modifying functions that humans produce and perceive in multimodal utterance, one being to create or attenuate focus. In this paper we test whether such modifying functions are also observed in speech and/or gesture of a VH, and whether this changes the perception of a VH overall. Results suggest that, although the VH's behaviour is judged rather neutral overall, focusing is distinctively recognised, leads to better recall, and affects perceived competence. These effects are strongest if focus is created jointly by speech and gesture.
Ben Fielding, Philip Kinghorn, Kamlesh Mistry, and Li Zhang
In this paper, we present an Embodied Conversational Agent (ECA) enriched with automatic image understanding, using vision data derived from state-of-the-art machine learning techniques for the advancement of autonomous interaction with the elderly or infirm. The agent is developed to conduct health and emotion well-being monitoring for the elderly. It is not only able to conduct question-answering via speech-based interaction, but also able to provide analysis of the user's surroundings, company, emotional states, hazards and fall actions via visual data using deep learning techniques. The agent is accessible from a web browser and can be communicated with via voice means, with a webcam required for the visual analysis functionality. The system has been evaluated with diverse real-life images to prove its efficiency.
Everlyne Kimani, Timothy Bickmore, Ha Trinh, Lazlo Ring, Michael Paasche-Orlow, and Jared Magnani
When deployed on smartphones, virtual agents have the potential to deliver life-saving advice regarding emergency medical conditions, as well as provide a convenient channel for health education to help improve the safety and efficacy of pharmacotherapy. This paper describes the use of a smartphone-based virtual agent that provides counseling to patients with Atrial Fibrillation, along with the results from a pilot acceptance study among patients with the condition. Atrial Fibrillation is a highly prevalent heart rhythm disorder and is known to significantly increase the risk of stroke, heart failure and death. In this study, a virtual agent is deployed in conjunction with a smartphone-based heart rhythm monitor that lets patients obtain real-time diagnostic information on the status of their atrial fibrillation and determine whether immediate action may be needed. The results of the study indicate that participants are satisfied with receiving information about Atrial Fibrillation via the virtual agent.
15:0015:15Closing Remarks
15:1518:00Optional ICT Demo Tour