Measuring Behavior 2000: Thoman

Assessing agreement among markings of behavioral events

B. Thomann

Department of Clinical Psychology, University of Zurich, Zurich, Switzerland

In a four-minute sequence from a videotaped dialogue between a psychotherapist and her patient, a number of clinical raters are given the task of searching for conspicuous events (verbal and nonverbal) and marking them accordingly. The assumption is that neither information about marking preferences (characteristics of the rater) nor information about the occurrence of conspicuous events worth marking (characteristics of the material) should exist, thus leading to very spontaneous and subjective marking. To avoid any kind of distraction and disturbance of the rater's imagination and spontaneity, an untypically rigid, 'menu-less', extremely simple touch screen realization for setting and working with events had to be designed.

The freedom of marking means that, as a result from superimposition of all the raters' markings, complicated configurations of reciprocal overlapping intervals emerge (Figure 1). A new definition of the concept 'marking agreement' is required, basing only on relational, not on metric considerations. Intervals of a subset of a marking configuration are declared as agreeing if (1) each interval overlaps with each other interval of the subset and (2) the intervals are not distinguishable from each other regarding their overlapping relation to the remaining intervals of the marking configuration. Such subsets are called 'nuclei'.

Figure 1. Superposition of events. The circled markings constitute 'nuclei'

The number of 'nuclei-markings' in the whole marking configuration of the raters indicates the degree of agreement. To be able to compare different configurations regarding a significance concept, a standardization on the basis of random configurations is made using Monte Carlo simulation.

The temporal aspect of marking is another topic of the study. The same material is given repetitively to the raters in several sessions. The focus is on the changes in the rater's marking. By accumulating the sessions on one hand and the raters on the other, marking configurations appear, growing in two dimensions. The analysis of this growth process tries to determine the variance of the characteristics of the raters and of the material, respectively.

Figure 2. The taped sequence is divided into 1150 units of .2 sec (x-axis). 17 raters have set a total of 409 nonverbal markings. 181 markings are organized in nuclei, according to the law of Figure 1, thus constituting an overall agreement of approx. 44%.

The statistical analysis of about 400 nonverbal (Figure 2) and 500 verbal markings, given by 17 raters in 6 sessions, yielded the following main results. There are distinct differences marking verbal and nonverbal behavior: Agreement in verbal markings is significantly higher. A majority of the raters vary their comments in the course of the sessions markedly. Effects of convergence (raters mark the same events in different sequences) and effects of divergence (raters tend to mark more individually at the end) occur at the same time. Raters who mark much more events than others do not mark more individually than others, but concur with a proportionally larger number of other raters. The higher the agreement regarding single events (strong nuclei) the more convergent the given comments. Rating verbal behavior is mainly guided by observable events, whereas rating of nonverbal behavior is influenced by rater stereotypes. Comparing 'many raters ´ few sessions' with 'few raters ´ many sessions', the first product leads to much higher agreement than the second.

Paper presented at Measuring Behavior 2000, 3^rd International Conference on Methods and Techniques in Behavioral Research, 15-18 August 2000, Nijmegen, The Netherlands