This site has been permanently archived. This is a static copy provided by the University of Southampton.
---
abstract: |
Work in safety management often involves classification of events using coding schemes or
"taxonomies". Such schemes contain separate categories, and users have to reliably choose
which codes apply to the events in question. The usefulness of any system is limited by the
reliability with which it can be employed, that is the consensus that can be reached on
application of codes. This technical note is concerned with practical and theoretical issues in
defining and measuring such reliability. Three problem areas are covered: the use of correlational
measures, the reporting and calculating of indices of concordance and the use of
correction coefficients.
altloc: []
chapter: ~
commentary: ~
commref: ~
confdates: ~
conference: ~
confloc: ~
contact_email: ~
creators_id: []
creators_name:
- family: Ross
given: Alastair
honourific: Dr
lineage: ''
- family: Wallace
given: Brendan
honourific: Dr
lineage: ''
- family: Davies
given: John
honourific: Professor
lineage: ''
date: 2004-10
date_type: published
datestamp: 2005-11-12
department: ~
dir: disk0/00/00/46/09
edit_lock_since: ~
edit_lock_until: ~
edit_lock_user: ~
editors_id: []
editors_name: []
eprint_status: archive
eprintid: 4609
fileinfo: /style/images/fileicons/application_pdf.png;/4609/1/http___www.sciencedirect.com_science__ob=MImg%26_imagekey=B6VF9%2D4B42BK0%2D1%2DC%26_cdi=6005%26_user=121723%26_orig=browse%26_coverDate=10%252F31%252F2004%26_sk=999579991%26view=c%26wchp=dGLbVtb%2DzSkWA%26md5=df41b3347189169dce2ab7a13c1707e6%26ie=_sdarticle.pdf
full_text_status: public
importid: ~
institution: ~
isbn: ~
ispublished: pub
issn: ~
item_issues_comment: []
item_issues_count: 0
item_issues_description: []
item_issues_id: []
item_issues_reported_by: []
item_issues_resolved_by: []
item_issues_status: []
item_issues_timestamp: []
item_issues_type: []
keywords: Taxonomies; Classification; Reliability; Inter-rater consensus
lastmod: 2011-03-11 08:56:13
latitude: ~
longitude: ~
metadata_visibility: show
note: ~
number: 8
pagerange: 771-778
pubdom: FALSE
publication: Safety Science
publisher: Elsevier
refereed: TRUE
referencetext: |-
Baber, C., Stanton, N.A., 1996. Human error identification techniques applied to public technology:
predictions compared with observed use. Applied Ergonomics 27 (2), 119–131.
Borg, W., Gall, M., 1989. Educational Research. Longman, London.
Carey, G., Gottesman, I.I., 1978. Reliability and validity in binary ratings: areas of common
misunderstanding in diagnosis and symptom ratings. Archives of General Psychiatry 35, 1454–1459.
Carlin, B.P., Louis, T.A., 2000. Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall,
New York.
Caro, T.M., Roper, R., Young, M., Dank, G.R., 1979. Inter-observer reliability. Behaviour 69 (3–4), 303–
315.
Cicchetti, D.V., Feinstein, A.R., 1990. High agreement but low kappa II: resolving the paradoxes. Journal
of Clinical Epidemiology 43 (6), 551–558.
Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological
Measurement 20, 37–46.
Cohen, J., 1968. Weighted kappa: nominal scale agreement with provision for scaled disagreement or
partial credit. Psychological Bulletin 70, 213–220.
Davies, J.B., Ross, A.J., Wallace, B., Wright, L., 2003. Safety management: A Qualitative Systems
Approach. Taylor and Francis, London.
Embrey, D.E., 1986. SHERPA: a systematic human error reduction and prediction approach. In:
Proceedings of the International Topical Meeting on Advances in Human Factors in Nuclear Power
Systems. American Nuclear Society, LaGrange Park.
Feinstein, A.R., Cicchetti, D.V., 1990. High agreement but low kappa: the problems of two paradoxes.
Journal of Clinical Epidemiology 43 (6), 543–549.
Ferry, T.S., 1988. Modern Accident Investigation and Analysis. Wiley & Sons, New York.
Fleiss, J.L., 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 378–
381.
A.J. Ross et al. / Safety Science 42 (2004) 771–778 777
Groeneweg, J., 1996. Controlling the Controllable: The Management of Safety, third revised ed. DSWO
Press, Leiden.
Grove, W.M., Andreasen, N.C., McDonald-Scott, P., Keller, M.B., Shapiro, R.W., 1981. Reliability
studies of psychiatric diagnosis. Archives of General Psychiatry 38, 408–413.
Hendrick, K., Benner, L., 1987. Investigating Accidents with STEP. Dekker, New York.
Hollnagel, E., 1998. Cognitive Reliability and Error Analysis Method. Elsevier Science, Oxford.
Isaac, A., Shorrock, S., Kirwin, B., Kennedy, R., Anderson, H., Bove, T., 2000. Learning from the past to
protect the future––the HERA approach. In: 24th European Association for Aviation Psychology
Conference, Crieff.
James, L.R., Demaree, R.G., Wolf, G., 1993. rwg: an assessment of Within-Group Interrater Agreement.
Journal of Applied Psychology 78 (2), 306–309.
Janes, C.L., 1979. Agreement measurement and the judgement process. Journal of Nervous and Mental
diseases 167, 343–347.
Johnson, W.G., 1980. MORT Safety Assurance Systems. Marcel Dekker, New York.
Kirwin, B.A., 1988. A comparative study of five human reliability assessment techniques. In: Sawyer, B.A.
(Ed.), Human Factors and Decision Making: Their Influence on Safety and Reliability. Elsevier
Applied Science, London, pp. 87–109.
Kirwin, B., 1992. Human error identification in human reliability assessment. Part 2: Detailed comparison
of techniques. Applied Ergonomics 23, 371–381.
Lee, M.D., Del Fabbro, P.H., 2002. A Bayesian coefficient of agreement for binary decisions. Available
from