EUDICO:
a general tool set for annotating and exploiting multimedia signals
P. Wittenburg
Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
At the MPI, a flexible tool set for annotating and exploiting multimedia signals containing different aspects of human or animal behavior is being developed. The nucleus is an Abstract Corpus Model (ACM), which can cope with many complex hierarchical annotation structures (including e.g. cross-references between annotations and dependencies between annotation tiers). An open XML-based interchange format has been defined to generate persistent output. The tool set covers an annotation tool that allows the user to specify an annotation structure, or select one from an existing tier set-up repository. To allow annotations in various languages, the EUDICO tool set fully supports UNICODE, and has input methods for writing systems such as Chinese, Arabic, Cyrillic, IPA and Hebrew. The annotation tool allows easy time alignment and is currently being extended to support the visualization of hierarchical encodings, such as interlinearized texts.
Various views of the data are supported, covering different types and numbers of media streams, such as video and audio channels, and an unlimited number of textual tracks. It is intended to extend this to time series data, such as those from eye-tracking and gesture-recording equipment. A flexible search tool allows the user to specify various combinations of patterns and distances between such patterns. The generated ‘hit list’ can then be used to return immediately to the fragments in the corpus. Combined with the browsable corpus tools, one can extend the search to whole corpora or corpus parts. In summary, we can say that the current version of EUDICO offers a number of new features, such as:
Thus, the EUDICO tool set supports a distributed corpus scenario (i.e. media and textual data can reside on different hosts in the web). It also supports working via the internet, through which only the relevant media fragments are distributed. EUDICO is written in Java, and is already being used in international projects. Easy download and launch is possible via the webstart mechanism from SUN, and a central, web-accessible bug report database should help with user interaction.
References
Paper presented at Measuring Behavior 2002 , 4th International Conference on Methods and Techniques in Behavioral Research, 27-30 August 2002, Amsterdam, The Netherlands