Reliability of observational data: obtaining different results with different estimation techniques
L.D. Goodwin
Department of Research & Evaluation Methodology, School of Education, University of Colorado, Denver, CO, U.S.A.
 
Inter-rater reliability of observational data can be estimated by various methods, including: simple percentages of agreement; kappa and weighted kappa; Pearson correlations; comparisons of means (using t-tests or analyses of variance); and generalizability (G) theory techniques [1]. The "method of choice" in a given situation depends on a number of factors (e.g., type of measurement scale used); nevertheless, the researcher often faces two crucial dilemmas: 1) how to design an effective reliability study, and 2) how to analyze the data in order to maximize the information yield.
In this presentation, two sets of data--one "hypothetical" and one "real"--will be used as the bases for illustrating the different results that can ensue from the use of various techniques available for the estimation of inter-rater reliability. Special attention will be given to the use of G-theory techniques, which represent a very comprehensive approach to reliability estimation. G-theory techniques allow the researcher to acknowledge and measure the multiple sources of error that exist simultaneously in any set of measurement data. The relative amount of variance attributable to a variety of factors--such as differences among raters, among items, across time or occasions--can be estimated separately. Further, variance due to the interactions among those factors also can be estimated.
The importance of making informed choices--choices based on full knowledge of what any specific estimation technique will yield in terms of information about the reliability of the measurement procedure--will be emphasized throughout the presentation. Discussion at the end will focus on the ways in which researchers can determine optimal types of reliability studies (and estimation approaches) for their particular types of data.
Paper presented at Measuring Behavior '98, 2nd International Conference on Methods and Techniques in Behavioral Research, 18-21 August 1998, Groningen, The Netherlands
© 1998 Noldus Information Technology b.v.