Gesture and speech multimodal conversational interaction in monocular video

D. McNeill1 and F. Quek2

1Departments of Psychology and Linguistics, University of Chicago, Chicago, IL, U.S.A.
2Department of Electrical Engineering and Computer Science, Wright State University, Dayton, OH, U.S.A.

We present our work on the determination of cues for discourse segmentation in free-form gesticulation accompanying speech in natural conversation. The basis for this integration is the psycholinguistic concept of the co-equal generation of gesture and speech from the same semantic intent. We use the psycholinguistic device known as the 'catchment' as the locus around which this integration proceeds. We present a detailed case study of a gesture and speech elicitation experiment in which a subject describes her living space to an interlocutor. We perform two independent sets of analyses on the video and audio data:

  1. We process the video data to obtain the motion traces of both of the subject's hands using the Vector Coherence Mapping algorithm that combines spatial, momentum and skin color constraints in parallel using a fuzzy image processing approach. We extract the voiced units from the audio signal as F0 signal groups.
  2. We perform expert transcription of the speech and gesture data by micro-analyzing the video tape using a frame-accurate video player to correlate the speech with the gestural entities. We also perform a higher level analysis using the transcribed text alone. The results of the psycholinguistic analyses are compared against the computed features to identify the cues accessible in the gestural and audio data that correlate well with the expert psycholinguistic analysis. The results of our analysis show that the feature of 'handedness' and the kind of symmetry in two-handed gestures provide effective cues for discourse segmentation.

We also present observations on how the gesture traces provide cues to segment the gesture stream, indicate high level discourse repair, and serve as super-segmental cues for discourse grouping.


Paper presented at Measuring Behavior 2000, 3rd International Conference on Methods and Techniques in Behavioral Research, 15-18 August 2000, Nijmegen, The Netherlands

© 2000 Noldus Information Technology b.v.