This site has been permanently archived. This is a static copy provided by the University of Southampton.
---
abstract: |-
This paper begins with a general theory of error in cross-validation testing of algorithms
for supervised learning from examples. It is assumed that the examples are described by
attribute-value pairs, where the values are symbolic. Cross-validation requires a set of
training examples and a set of testing examples. The value of the attribute that is to be
predicted is known to the learner in the training set, but unknown in the testing set. The
theory demonstrates that cross-validation error has two components: error on the training
set (inaccuracy) and sensitivity to noise (instability).
This general theory is then applied to voting in instance-based learning. Given an
example in the testing set, a typical instance-based learning algorithm predicts the designated
attribute by voting among the k nearest neighbors (the k most similar examples) to
the testing example in the training set. Voting is intended to increase the stability (resistance
to noise) of instance-based learning, but a theoretical analysis shows that there are
circumstances in which voting can be destabilizing. The theory suggests ways to minimize
cross-validation error, by insuring that voting is stable and does not adversely affect
accuracy.
altloc:
- http://extractor.iit.nrc.ca/publications/NRC-35073.pdf
chapter: ~
commentary: ~
commref: ~
confdates: ~
conference: ~
confloc: ~
contact_email: ~
creators_id: []
creators_name:
- family: Turney
given: Peter D.
honourific: ''
lineage: ''
date: 1994
date_type: published
datestamp: 2001-10-13
department: ~
dir: disk0/00/00/18/21
edit_lock_since: ~
edit_lock_until: ~
edit_lock_user: ~
editors_id: []
editors_name: []
eprint_status: archive
eprintid: 1821
fileinfo: /style/images/fileicons/application_pdf.png;/1821/3/NRC%2D35073.pdf
full_text_status: public
importid: ~
institution: ~
isbn: ~
ispublished: pub
issn: ~
item_issues_comment: []
item_issues_count: 0
item_issues_description: []
item_issues_id: []
item_issues_reported_by: []
item_issues_resolved_by: []
item_issues_status: []
item_issues_timestamp: []
item_issues_type: []
keywords: ~
lastmod: 2011-03-11 08:54:48
latitude: ~
longitude: ~
metadata_visibility: show
note: ~
number: ~
pagerange: 331-360
pubdom: FALSE
publication: Journal of Experimental and Theoretical Artificial Intelligence
publisher: ~
refereed: TRUE
referencetext: |-
Aha, D.W., Kibler, D., & Albert, M.K. (1991) Instance-based learning algorithms,
Machine Learning, 6:37-66.
Cover, T.M., & Hart, P.E. (1967) Nearest neighbor pattern classification, IEEE Transac-tions
on Information Theory, IT-13:21-27. Also in (Dasarathy, 1991).
Dasarathy, B.V. (1991) Nearest Neighbor Pattern Classification Techniques, Edited col-lection
(California: IEEE Press).
Fix, E., & Hodges, J.L. (1951) Discriminatory analysis: nonparametric discrimination:
consistency properties, Project 21-49-004, Report Number 4, USAF School of
Aviation Medicine, Randolph Field, Texas, 261-279. Also in (Dasarathy, 1991).
Fraser, D.A.S. (1976) Probability and Statistics: Theory and Applications (Massachusetts:
Duxbury Press).
Kibler, D., Aha, D.W., & Albert, M.K. (1989) Instance-based prediction of real-valued
attributes, Computational Intelligence, 5:51-57.
Langley, P. (1993) Average-case analysis of a nearest neighbor algorithm, Proceedings of
the Thirteenth International Joint Conference on Artificial Intelligence, Chamb�ry,
France, in press.
Sakamoto, Y., Ishiguro, M., & Kitagawa, G. (1986) Akaike Information Criterion Statis-tics
(Dordrecht, Holland: Kluwer Academic Publishers).
Tomek, I. (1976) A generalization of the k-NN rule, IEEE Transactions on Systems, Man,
and Cybernetics, SMC-6:121-126. Also in (Dasarathy, 1991).
Turney, P.D. (1990) The curve fitting problem: a solution, British Journal for the Philoso-phy
of Science, 41:509-530.
Turney, P.D. (1993) A theory of cross-validation error. Submitted to the Journal of Exper-imental
and Theoretical Artificial Intelligence.
Weiss, S.M., & Kulikowski, C.A. (1991) Computer Systems that Learn: Classification and
Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert
Systems (California: Morgan Kaufmann).
relation_type: []
relation_uri: []
reportno: ~
rev_number: 12
series: ~
source: ~
status_changed: 2007-09-12 16:41:13
subjects:
- comp-sci-art-intel
- comp-sci-mach-learn
- comp-sci-stat-model
succeeds: ~
suggestions: ~
sword_depositor: ~
sword_slug: ~
thesistype: ~
title: Theoretical analyses of cross-validation error and voting in instance-based learning
type: journalp
userid: 2175
volume: 6