Reliability analysis on continuously measured behavioral data
R.G. Jansen, R.G.M. Elbers, E.S. Meyer and L.F. Wiertz
Noldus Information Technology bv, Wageningen, The Netherlands
Background
Reliability analysis theory has been developed for assessing, on a nominal scale,
a series of cases by two independent raters. For each case, the assessment results
are tallied in a confusion matrix: a contingency table in which agreements
and disagreements in the classifications of both raters are presented (Figure
1). Reliability statistics, such as percentage of agreement, Cohen's kappa
(agreement corrected for chance agreement) and Pearson's correlation coefficient
rho, can be computed from the confusion table.
|
Rater 2
|
||||
|
Rater 1
|
Nominal value
|
A
|
B
|
C
|
|
A
|
2
|
3
|
0
|
|
|
B
|
0
|
4
|
1
|
|
|
C
|
1
|
1
|
3
|
|
Figure 1. Example of a contingency table in which agreements and disagreements have been tallied.
Reliability analysis and continuously recorded behavioral data
In behavioral research, reliability analysis must deal with two sets of time-structured data ('observations'). This is not a problem if the two observations compared involve behaviors sampled at fixed intervals, since each sample then corresponds with a case. But if they involve behaviors that have been recorded continuously, such that the start and end of each case are measured subjectively, the following problems arise in finding a basis for comparing the data sets:
|
Observation 1
|
Observation 2
|
|||
|
Case
|
Behavior
|
time
|
Behavior
|
time
|
|
1
|
Walk
|
0
|
Walk
|
0
|
|
2
|
Jog
|
5
|
-
|
-
|
|
3
|
-
|
-
|
Run
|
7
|
|
4
|
Hold
|
12
|
Hold
|
11
|
|
Observation 1
|
Observation 2
|
|||
|
Case
|
Behavior
|
time
|
Behavior
|
time
|
|
1
|
Walk
|
0
|
Walk
|
0
|
|
2
|
Jog
|
5
|
Run
|
7
|
|
3
|
Hold
|
12
|
Hold
|
11
|
Figure 2. Example showing the difficulty of objectively determining the number and type of cases.
Reliability analysis
using The Observer software
The Observer 4.1 offers a new set of reliability functions capable of taking
the time-structured nature of the data into account. Reliability analyses are
based on four different methods of comparing continuously recorded data sets,
which differ in the way the three problems mentioned earlier are dealt with:
Furthermore, The Observer
presents outcomes at different levels:
Figure 3. Example of
a case-by-case comparison in The Observer 4.1. Figure 4. Example of
a confusion matrix in The Observer 4.1. Figure 5. Example of
reliability measures in The Observer 4.1. Paper presented at Measuring
Behavior 2002 , 4th International Conference on Methods and
Techniques in Behavioral Research, 27-30 August 2002, Amsterdam, The Netherlands © 2002 Noldus
Information Technology bv


