--- abstract: | Work in safety management often involves classification of events using coding schemes or "taxonomies". Such schemes contain separate categories, and users have to reliably choose which codes apply to the events in question. The usefulness of any system is limited by the reliability with which it can be employed, that is the consensus that can be reached on application of codes. This technical note is concerned with practical and theoretical issues in defining and measuring such reliability. Three problem areas are covered: the use of correlational measures, the reporting and calculating of indices of concordance and the use of correction coefficients. altloc: [] chapter: ~ commentary: ~ commref: ~ confdates: ~ conference: ~ confloc: ~ contact_email: ~ creators_id: [] creators_name: - family: Ross given: Alastair honourific: Dr lineage: '' - family: Wallace given: Brendan honourific: Dr lineage: '' - family: Davies given: John honourific: Professor lineage: '' date: 2004-10 date_type: published datestamp: 2005-11-12 department: ~ dir: disk0/00/00/46/09 edit_lock_since: ~ edit_lock_until: ~ edit_lock_user: ~ editors_id: [] editors_name: [] eprint_status: archive eprintid: 4609 fileinfo: /style/images/fileicons/application_pdf.png;/4609/1/http___www.sciencedirect.com_science__ob=MImg%26_imagekey=B6VF9%2D4B42BK0%2D1%2DC%26_cdi=6005%26_user=121723%26_orig=browse%26_coverDate=10%252F31%252F2004%26_sk=999579991%26view=c%26wchp=dGLbVtb%2DzSkWA%26md5=df41b3347189169dce2ab7a13c1707e6%26ie=_sdarticle.pdf full_text_status: public importid: ~ institution: ~ isbn: ~ ispublished: pub issn: ~ item_issues_comment: [] item_issues_count: 0 item_issues_description: [] item_issues_id: [] item_issues_reported_by: [] item_issues_resolved_by: [] item_issues_status: [] item_issues_timestamp: [] item_issues_type: [] keywords: Taxonomies; Classification; Reliability; Inter-rater consensus lastmod: 2011-03-11 08:56:13 latitude: ~ longitude: ~ metadata_visibility: show note: ~ number: 8 pagerange: 771-778 pubdom: FALSE publication: Safety Science publisher: Elsevier refereed: TRUE referencetext: |- Baber, C., Stanton, N.A., 1996. Human error identification techniques applied to public technology: predictions compared with observed use. Applied Ergonomics 27 (2), 119–131. Borg, W., Gall, M., 1989. Educational Research. Longman, London. Carey, G., Gottesman, I.I., 1978. Reliability and validity in binary ratings: areas of common misunderstanding in diagnosis and symptom ratings. Archives of General Psychiatry 35, 1454–1459. Carlin, B.P., Louis, T.A., 2000. Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall, New York. Caro, T.M., Roper, R., Young, M., Dank, G.R., 1979. Inter-observer reliability. Behaviour 69 (3–4), 303– 315. Cicchetti, D.V., Feinstein, A.R., 1990. High agreement but low kappa II: resolving the paradoxes. Journal of Clinical Epidemiology 43 (6), 551–558. Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37–46. Cohen, J., 1968. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70, 213–220. Davies, J.B., Ross, A.J., Wallace, B., Wright, L., 2003. Safety management: A Qualitative Systems Approach. Taylor and Francis, London. Embrey, D.E., 1986. SHERPA: a systematic human error reduction and prediction approach. In: Proceedings of the International Topical Meeting on Advances in Human Factors in Nuclear Power Systems. American Nuclear Society, LaGrange Park. Feinstein, A.R., Cicchetti, D.V., 1990. High agreement but low kappa: the problems of two paradoxes. Journal of Clinical Epidemiology 43 (6), 543–549. Ferry, T.S., 1988. Modern Accident Investigation and Analysis. Wiley & Sons, New York. Fleiss, J.L., 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 378– 381. A.J. Ross et al. / Safety Science 42 (2004) 771–778 777 Groeneweg, J., 1996. Controlling the Controllable: The Management of Safety, third revised ed. DSWO Press, Leiden. Grove, W.M., Andreasen, N.C., McDonald-Scott, P., Keller, M.B., Shapiro, R.W., 1981. Reliability studies of psychiatric diagnosis. Archives of General Psychiatry 38, 408–413. Hendrick, K., Benner, L., 1987. Investigating Accidents with STEP. Dekker, New York. Hollnagel, E., 1998. Cognitive Reliability and Error Analysis Method. Elsevier Science, Oxford. Isaac, A., Shorrock, S., Kirwin, B., Kennedy, R., Anderson, H., Bove, T., 2000. Learning from the past to protect the future––the HERA approach. In: 24th European Association for Aviation Psychology Conference, Crieff. James, L.R., Demaree, R.G., Wolf, G., 1993. rwg: an assessment of Within-Group Interrater Agreement. Journal of Applied Psychology 78 (2), 306–309. Janes, C.L., 1979. Agreement measurement and the judgement process. Journal of Nervous and Mental diseases 167, 343–347. Johnson, W.G., 1980. MORT Safety Assurance Systems. Marcel Dekker, New York. Kirwin, B.A., 1988. A comparative study of five human reliability assessment techniques. In: Sawyer, B.A. (Ed.), Human Factors and Decision Making: Their Influence on Safety and Reliability. Elsevier Applied Science, London, pp. 87–109. Kirwin, B., 1992. Human error identification in human reliability assessment. Part 2: Detailed comparison of techniques. Applied Ergonomics 23, 371–381. Lee, M.D., Del Fabbro, P.H., 2002. A Bayesian coefficient of agreement for binary decisions. Available from . Lehane, P., Stubbs, D., 2001. The perceptions of managers and accident subjects in the service industries towards slip and trip accidents. Applied Ergonomics 32, 119–126. Leonard, T., Hsu, J.S.J., 1999. Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers. Cambridge University Press, New York. Martin, P., Bateson, P., 1993. Measuring Behaviour: An Introductory Guide. CUP, Cambridge. Maxwell, A.E., 1977. Coefficients of agreement between observers and their interpretation. British Journal of Psychiatry 130, 79–83. Munton, A.G., Silvester, J., Stratton, P., Hanks, H., 1999. Attributions in Action: A Practical Approach to Coding Qualitative Data. Wiley, Chichester. Posner, K.L., Sampson, P.D., Capln, R.A., Ward, R.J., Cheney, F.W., 1990. Measuring interrater reliability among multiple raters: an example of methods for nominal data. Statistics in Medicine 9, 1103–1115. Rasmussen, J., Pedersen, O.M., Mancini, G., Carnino, A., Griffon, M., Gagnolet, P., 1981. Classification System for Reporting Events Involving Human Malfunctions. Risø National Laboratory, Roskilde. Reason, J., 1990. Human Error. CUP, Cambridge. Silvester, J., Anderson, N.R., Patterson, F., 1999. Organizational culture change: an inter-group attributional analysis. Journal of Occupational and Organisational Psychology 72, 1–23. Sivia, D.S., 1996. Data Analysis: A Bayesian Tutorial. Clarendon Press, Oxford. Spitznagel, E.L., Helzer, J.E., 1985. A proposed solution to the base rate problem in the kappa statistic. Archives of General Psychiatry 42, 725–728. Stanton, N.A., Stevenage, S.V., 1998. Learning to predict human error: issues of acceptability, reliability and validity. Ergonomics 41 (11), 1737–1756. Stratton, P., Munton, A.G., Hanks, H., Heard, D.H., Davidson, C., 1988. Leeds Attributional Coding System (LACS) Manual. LFTRC, Leeds. Thompson, W.D., Walter, S.D., 1988. A reappraisal of the k coefficient: k and the concept of independent errors. Journal of Clinical Epidemiology 41, 949–958, 969–970. Wagenaar, A., van der Schrier, J., 1997. Accident analysis: the goal, and how to get there. Safety Science 26 (1), 25–33. Wallace, B., Ross, A., Davies, J.B., Wright, L., 2002. The creation of a new minor event coding system. Cognition Technology and Work 4, 1–8. Yule, G.U., 1912. On the methods of measuring association between two attributes. Journal of the Royal Statistical Society 75, 581–642. 778 A.J. Ross et al. / Safety Science 42 (2004) 771–778 relation_type: [] relation_uri: [] reportno: ~ rev_number: 12 series: ~ source: ~ status_changed: 2007-09-12 17:01:13 subjects: - behanal succeeds: ~ suggestions: ~ sword_depositor: ~ sword_slug: ~ thesistype: ~ title: Measurement issues in taxonomic reliability type: journalp userid: 4216 volume: 42