creators_name: Turney, Peter D.
type: journalp
datestamp: 2001-10-13
lastmod: 2011-03-11 08:54:48
metadata_visibility: show
title: A theory of cross-validation error
ispublished: pub
subjects: comp-sci-art-intel
subjects: comp-sci-mach-learn
subjects: comp-sci-stat-model
full_text_status: public
abstract: This paper presents a theory of error in cross-validation testing of algorithms for predicting
real-valued attributes. The theory justifies the claim that predicting real-valued
attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore,
the theory indicates precisely how these conflicting demands must be balanced, in
order to minimize cross-validation error. A general theory is presented, then it is
developed in detail for linear regression and instance-based learning.
date: 1994
date_type: published
publication: Journal of Experimental and Theoretical Artificial Intelligence
volume: 6
pagerange: 361-391
refereed: TRUE
referencetext: Aha, D.W., Kibler, D. (1989) Noise-tolerant instance-based learning algorithms, Proceed-ings
of the Eleventh International Joint Conference on Artificial Intelligence, 794-
799.
Aha, D.W., Kibler, D., & Albert, M.K. (1991) Instance-based learning algorithms,
Machine Learning, 6:37-66.
Akaike, H. (1970) Statistical predictor identification, Annals of the Institute of Statistical
Mathematics, 22:203-217.
Akaike, H. (1973) Information theory and an extension of the maximum likelihood
principle, Second International Symposium on Information Theory, edited by B.N.
Petrov and F. Csaki (Budapest: Akademia Kiado).
Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on
Automatic Control, AC-19: 716-723.
Barron, A.R. (1984) Predicted squared error: a criterion for automatic model selection, in
Self-organizing Methods in Modeling: GMDH Type Algorithms, edited by S.J.
Farlow (New York: Marcel Dekker).
Dasarathy, B.V. (1991) Nearest Neighbor Pattern Classification Techniques, Edited col-lection
(California: IEEE Press).
Draper, N.R. & Smith, H. (1981) Applied Regression Analysis, Second Edition (New
York: John Wiley & Sons).
Ein-Dor, P. & Feldmesser, J. (1987) Attributes of the performance of central processing
units: a relative performance prediction model, Communications of the ACM,
30:308-317.
Eubank, R.L. (1988) Spline Smoothing and Nonparametric Regression (New York:
Marcel Dekker).
Fraser, D.A.S. (1976) Probability and Statistics: Theory and Applications (Massachusetts:
Duxbury Press).
Geman, S., Bienenstock, E., & Doursat, R. (1992) Neural networks and the bias/variance
dilemma, Neural Computation, 4:1-58.
Kibler, D., Aha, D.W., & Albert, M.K. (1989) Instance-based prediction of real-valued
attributes, Computational Intelligence, 5:51-57.
Moody, J.E. (1991) Note on generalization, regularization, and architecture selection in
nonlinear learning systems, First IEEE-SP Workshop on Neural Networks for Signal
Processing (California: IEEE Press).
Moody, J.E. (1992) The effective number of parameters: an analysis of generalization and
regularization in nonlinear learning systems, in Advances in Neural Information
Processing Systems 4, edited by J.E. Moody, S.J. Hanson, and R.P. Lippmann (Cali-fornia:
Morgan Kaufmann).
Sakamoto, Y., Ishiguro, M., & Kitagawa, G. (1986) Akaike Information Criterion Statis-tics
(Dordrecht, Holland: Kluwer Academic Publishers).
Strang, G. (1976) Linear Algebra and Its Applications (New York: Academic Press).
Turney, P.D. (1990) The curve fitting problem: a solution, British Journal for the Philoso-phy
of Science, 41:509-530.
citation: Turney, Peter D. (1994) A theory of cross-validation error. [Journal (Paginated)]
document_url: http://cogprints.org/1820/3/NRC-35072.pdf