EUDICO:
a general tool set for annotating and exploiting multimedia signals

P. Wittenburg

Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands

 

At the MPI, a flexible tool set for annotating and exploiting multimedia signals containing different aspects of human or animal behavior is being developed. The nucleus is an Abstract Corpus Model (ACM), which can cope with many complex hierarchical annotation structures (including e.g. cross-references between annotations and dependencies between annotation tiers). An open XML-based interchange format has been defined to generate persistent output. The tool set covers an annotation tool that allows the user to specify an annotation structure, or select one from an existing tier set-up repository. To allow annotations in various languages, the EUDICO tool set fully supports UNICODE, and has input methods for writing systems such as Chinese, Arabic, Cyrillic, IPA and Hebrew. The annotation tool allows easy time alignment and is currently being extended to support the visualization of hierarchical encodings, such as interlinearized texts.

Various views of the data are supported, covering different types and numbers of media streams, such as video and audio channels, and an unlimited number of textual tracks. It is intended to extend this to time series data, such as those from eye-tracking and gesture-recording equipment. A flexible search tool allows the user to specify various combinations of patterns and distances between such patterns. The generated ‘hit list’ can then be used to return immediately to the fragments in the corpus. Combined with the browsable corpus tools, one can extend the search to whole corpora or corpus parts. In summary, we can say that the current version of EUDICO offers a number of new features, such as:

Thus, the EUDICO tool set supports a distributed corpus scenario (i.e. media and textual data can reside on different hosts in the web). It also supports working via the internet, through which only the relevant media fragments are distributed. EUDICO is written in Java, and is already being used in international projects. Easy download and launch is possible via the webstart mechanism from SUN, and a central, web-accessible bug report database should help with user interaction.

References

  1. Wittenburg, P.; Mosel, U.; Dwyer, A. (2002). Methods of Language Documentation in the DOBES Program. Proceedings of the LREC 2002 Conference. Las Palmas, Spain.
  2. Brugman, H.; Wittenburg, P.; Levinson, St.; Kita; S. (2002). Multimodal Annotations in Gesture and Sign Language Studies. Proceedings of the LREC 2002 Conference. Las Palmas, Spain.
  3. Brugman, H.; Spenke, H.; Kramer, M.; Klassmann, A. (2002). Multimedia Annotation with Multilingual Input Methods and Search Support. Proceedings of the LREC 2002 Conference. Las Palmas, Spain.
  4. Skiba, R.; Brugman, H.; Broeder, D.; Wittenburg, P. (2002). Corpus Organization and Access in Field Linguistics at the MPI. Proceedings of the LREC 2002 Conference. Las Palmas, Spain.
  5. Broeder, D.; Offenga, F.; Willems, D. (2002). Metadata Tools Supporting Controlled Vocabulary Services. Proceedings of the LREC 2002 Conference. Las Palmas, Spain.
  6. Wittenburg, P., Peters, W.; Broeder, D. (2002). Metadata Proposals for Corpora and Lexica. Proceedings of the LREC 2002 Conference. Las Palmas, Spain.
  7. Brugman, H.; Wittenburg, P. (2001). The application of annotation models for the construction of databases and tools. Linguistic Database Workshop, Philadelphia.
  8. Broeder, D.; Offenga, F.; Willems, D.; Wittenburg, P. (2001). The IMDI Metadata set, its tools and accessible linguistic databases. Linguistic Database Workshop, Philadelphia.
  9. Russel, H.; Brugman, D.; Broeder, P.; Wittenburg, P. (2000). EUDICO: an annotation and exploitation tool for multimedia corpora. Measuring Behavior 2000.


Paper presented at Measuring Behavior 2002 , 4th International Conference on Methods and Techniques in Behavioral Research, 27-30 August 2002, Amsterdam, The Netherlands

© 2002 Noldus Information Technology bv