Measurement issues in taxonomic reliability

Ross, Dr Alastair and Wallace, Dr Brendan and Davies, Professor John (2004) Measurement issues in taxonomic reliability. [Journal (Paginated)]

Full text available as:



Work in safety management often involves classification of events using coding schemes or "taxonomies". Such schemes contain separate categories, and users have to reliably choose which codes apply to the events in question. The usefulness of any system is limited by the reliability with which it can be employed, that is the consensus that can be reached on application of codes. This technical note is concerned with practical and theoretical issues in defining and measuring such reliability. Three problem areas are covered: the use of correlational measures, the reporting and calculating of indices of concordance and the use of correction coefficients.

Item Type:Journal (Paginated)
Keywords:Taxonomies; Classification; Reliability; Inter-rater consensus
Subjects:Psychology > Behavioral Analysis
ID Code:4609
Deposited By: Wallace, Dr Brendan
Deposited On:12 Nov 2005
Last Modified:11 Mar 2011 08:56

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

Baber, C., Stanton, N.A., 1996. Human error identification techniques applied to public technology:

predictions compared with observed use. Applied Ergonomics 27 (2), 119–131.

Borg, W., Gall, M., 1989. Educational Research. Longman, London.

Carey, G., Gottesman, I.I., 1978. Reliability and validity in binary ratings: areas of common

misunderstanding in diagnosis and symptom ratings. Archives of General Psychiatry 35, 1454–1459.

Carlin, B.P., Louis, T.A., 2000. Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall,

New York.

Caro, T.M., Roper, R., Young, M., Dank, G.R., 1979. Inter-observer reliability. Behaviour 69 (3–4), 303–


Cicchetti, D.V., Feinstein, A.R., 1990. High agreement but low kappa II: resolving the paradoxes. Journal

of Clinical Epidemiology 43 (6), 551–558.

Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological

Measurement 20, 37–46.

Cohen, J., 1968. Weighted kappa: nominal scale agreement with provision for scaled disagreement or

partial credit. Psychological Bulletin 70, 213–220.

Davies, J.B., Ross, A.J., Wallace, B., Wright, L., 2003. Safety management: A Qualitative Systems

Approach. Taylor and Francis, London.

Embrey, D.E., 1986. SHERPA: a systematic human error reduction and prediction approach. In:

Proceedings of the International Topical Meeting on Advances in Human Factors in Nuclear Power

Systems. American Nuclear Society, LaGrange Park.

Feinstein, A.R., Cicchetti, D.V., 1990. High agreement but low kappa: the problems of two paradoxes.

Journal of Clinical Epidemiology 43 (6), 543–549.

Ferry, T.S., 1988. Modern Accident Investigation and Analysis. Wiley & Sons, New York.

Fleiss, J.L., 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 378–


A.J. Ross et al. / Safety Science 42 (2004) 771–778 777

Groeneweg, J., 1996. Controlling the Controllable: The Management of Safety, third revised ed. DSWO

Press, Leiden.

Grove, W.M., Andreasen, N.C., McDonald-Scott, P., Keller, M.B., Shapiro, R.W., 1981. Reliability

studies of psychiatric diagnosis. Archives of General Psychiatry 38, 408–413.

Hendrick, K., Benner, L., 1987. Investigating Accidents with STEP. Dekker, New York.

Hollnagel, E., 1998. Cognitive Reliability and Error Analysis Method. Elsevier Science, Oxford.

Isaac, A., Shorrock, S., Kirwin, B., Kennedy, R., Anderson, H., Bove, T., 2000. Learning from the past to

protect the future––the HERA approach. In: 24th European Association for Aviation Psychology

Conference, Crieff.

James, L.R., Demaree, R.G., Wolf, G., 1993. rwg: an assessment of Within-Group Interrater Agreement.

Journal of Applied Psychology 78 (2), 306–309.

Janes, C.L., 1979. Agreement measurement and the judgement process. Journal of Nervous and Mental

diseases 167, 343–347.

Johnson, W.G., 1980. MORT Safety Assurance Systems. Marcel Dekker, New York.

Kirwin, B.A., 1988. A comparative study of five human reliability assessment techniques. In: Sawyer, B.A.

(Ed.), Human Factors and Decision Making: Their Influence on Safety and Reliability. Elsevier

Applied Science, London, pp. 87–109.

Kirwin, B., 1992. Human error identification in human reliability assessment. Part 2: Detailed comparison

of techniques. Applied Ergonomics 23, 371–381.

Lee, M.D., Del Fabbro, P.H., 2002. A Bayesian coefficient of agreement for binary decisions. Available

from <>.

Lehane, P., Stubbs, D., 2001. The perceptions of managers and accident subjects in the service industries

towards slip and trip accidents. Applied Ergonomics 32, 119–126.

Leonard, T., Hsu, J.S.J., 1999. Bayesian Methods: An Analysis for Statisticians and Interdisciplinary

Researchers. Cambridge University Press, New York.

Martin, P., Bateson, P., 1993. Measuring Behaviour: An Introductory Guide. CUP, Cambridge.

Maxwell, A.E., 1977. Coefficients of agreement between observers and their interpretation. British Journal

of Psychiatry 130, 79–83.

Munton, A.G., Silvester, J., Stratton, P., Hanks, H., 1999. Attributions in Action: A Practical Approach

to Coding Qualitative Data. Wiley, Chichester.

Posner, K.L., Sampson, P.D., Capln, R.A., Ward, R.J., Cheney, F.W., 1990. Measuring interrater

reliability among multiple raters: an example of methods for nominal data. Statistics in Medicine 9,


Rasmussen, J., Pedersen, O.M., Mancini, G., Carnino, A., Griffon, M., Gagnolet, P., 1981. Classification

System for Reporting Events Involving Human Malfunctions. Risø National Laboratory, Roskilde.

Reason, J., 1990. Human Error. CUP, Cambridge.

Silvester, J., Anderson, N.R., Patterson, F., 1999. Organizational culture change: an inter-group

attributional analysis. Journal of Occupational and Organisational Psychology 72, 1–23.

Sivia, D.S., 1996. Data Analysis: A Bayesian Tutorial. Clarendon Press, Oxford.

Spitznagel, E.L., Helzer, J.E., 1985. A proposed solution to the base rate problem in the kappa statistic.

Archives of General Psychiatry 42, 725–728.

Stanton, N.A., Stevenage, S.V., 1998. Learning to predict human error: issues of acceptability, reliability

and validity. Ergonomics 41 (11), 1737–1756.

Stratton, P., Munton, A.G., Hanks, H., Heard, D.H., Davidson, C., 1988. Leeds Attributional Coding

System (LACS) Manual. LFTRC, Leeds.

Thompson, W.D., Walter, S.D., 1988. A reappraisal of the k coefficient: k and the concept of independent

errors. Journal of Clinical Epidemiology 41, 949–958, 969–970.

Wagenaar, A., van der Schrier, J., 1997. Accident analysis: the goal, and how to get there. Safety Science

26 (1), 25–33.

Wallace, B., Ross, A., Davies, J.B., Wright, L., 2002. The creation of a new minor event coding system.

Cognition Technology and Work 4, 1–8.

Yule, G.U., 1912. On the methods of measuring association between two attributes. Journal of the Royal

Statistical Society 75, 581–642.

778 A.J. Ross et al. / Safety Science 42 (2004) 771–778


Repository Staff Only: item control page