EUDICO: an annotation and exploitation tool for multimedia corpora

H. Russel, D. Brugman, P. Broeder and P. Wittenburg

Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands

The EUDICO project started in 1997 at the Max Planck Institute for Psycholinguistics as an effort to make multimedia-related resources with linguistic content available to an as large as possible scientific audience. This has resulted in a powerful set of platform-independent software tools that operate over the Internet and thereby give researchers all over the world access to important linguistic resources with multimedia content. The essential design decisions making EUDICO currently a unique multimedia annotation and exploitation tool are: (1) distributed and Internet-capable operation; (2) resource format independent operation; (3) different fully synchronized viewers presenting the data to the user; (4) freely definable annotation structure; (5) central and local data storage; (6) central software management.

The resources that we support in the EUDICO project consist of media tracks and annotation layers that contain linguisticaly relevant descriptions of events that occur within the media tracks. These annotation layers are freely definable but all share as a common feature that they can be precisely located on the media time axis. To support as many existing corpora as possible, an Abstract Corpus Model (ACM) was designed that has the expressive power to describe all the linguistic annotation schemes that are currently in use. The ACM is implemented as Java objects and the software tools are designed to work with this ACM. Dealing with a new corpus within the EUDICO framework simply means to make a mapping from the specific corpus format to the ACM. All existing EUDICO tools are thereafter instantly available for the new corpus. Currently three important formats are supported, i.e. Childes, Gesture and Tipster.

The tools that are available to work with the multimedia corpora make it possible to analyze their content and to add new annotations to them. An EUDICO client can choose a subset of the corpus data he is interested in by browsing through a corpus tree or by giving some formal selection The tools that are available to work with the multimedia corpora make it possible to analyze their content and to add new annotations to them. An EUDICO client can choose a subset of the corpus data he is interested in by browsing through a corpus tree or by giving some formal selection criterium. For such a data selection he can start viewer tools to show the multimedia data and annotation content. The audio data can be played along with a signal waveform.

Figure 1. The Video window is displayed simultaneously with the Viewer selector. In this case an MPEG-1 movie is shown. In case of audio-only media data the video component is just missing from this window. The controls on the bottom of the window are, from left to right: a running time code counter that can be used to jump to some specific time as well, the standard JMF video control bar, and a button that plays only the currently selected time interval. The panels under the video image are subtitle viewers.

The video data can be played with the standard functionality as found on video recorders (Figure 1). The annotation data can be visualized in all kinds of special-purpose viewers that are fully synchronized with the media players. One view shows the annotation data as dynamic subtitles to the media data. Another view shows the annotation data as ordered lists. When the media data is being played the corresponding annotation data is highlighted within the list. By clicking an annotation in such a list the user can set the media time to the start time of the annotation segment. When visual information regarding the relative position of annotations to each other is needed, we offer a partitur-like view where the content of the selected annotation tracks are indicated along the time line. This makes it for instance possible to easily detect overlap of segments from different annotation layers.

Figure 2. Time line view is a bit different from other viewers in the sense that is represents tags from multiple tiers in one view. Each tag is shown as a bar, where the horizontal position and length of the bar indicates the tag's begin time and duration. The tag's values are only represented by the beginning of the value of the tag's first field. Pressing the 'popup button' (the right mouse button on most systems) displays a tag panel with all fields. The vertical order of the tiers can be changed simply by dragging the tier's name label to the proper position. The current time is represented by a red cross-hair in the middle of the view. When playing the tag bars scroll horizontally, always reflecting the proper media time. Scrolling in this viewer using the scroll bar changes the current time in all other viewers as well.

The only requirement to work with the EUDICO framework is a computer that has a connection to the Internet supporting MPEG-1 streams and it must have the freely available Java runtime environment with the Java Media Framework software extension installed. All other software is sent to the client over the Internet in a completely transparent manner at the moment he starts a tool. This mechanism ensures that the client always works with the newest version of all available tools. The requested parts of the media data are streamed to the client's computer at the moment that they need to be rendered. There is no need for temporary storage of the media data on the client's machine.

EUDICO is an open software framework that will be extended continuously with many features making it a major tool during the coming decade. Because of the highly modular and object-oriented architecture of the software the set of tools is easily expandable if new functionality is desired.


Paperr presented at Measuring Behavior 2000, 3rd International Conference on Methods and Techniques in Behavioral Research, 15-18 August 2000, Nijmegen, The Netherlands

© 2000 Noldus Information Technology b.v.