creators_name: Turney, Peter D. type: journalp datestamp: 2001-10-13 lastmod: 2011-03-11 08:54:48 metadata_visibility: show title: A theory of cross-validation error ispublished: pub subjects: comp-sci-art-intel subjects: comp-sci-mach-learn subjects: comp-sci-stat-model full_text_status: public abstract: This paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precisely how these conflicting demands must be balanced, in order to minimize cross-validation error. A general theory is presented, then it is developed in detail for linear regression and instance-based learning. date: 1994 date_type: published publication: Journal of Experimental and Theoretical Artificial Intelligence volume: 6 pagerange: 361-391 refereed: TRUE referencetext: Aha, D.W., Kibler, D. (1989) Noise-tolerant instance-based learning algorithms, Proceed-ings of the Eleventh International Joint Conference on Artificial Intelligence, 794- 799. Aha, D.W., Kibler, D., & Albert, M.K. (1991) Instance-based learning algorithms, Machine Learning, 6:37-66. Akaike, H. (1970) Statistical predictor identification, Annals of the Institute of Statistical Mathematics, 22:203-217. Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory, edited by B.N. Petrov and F. Csaki (Budapest: Akademia Kiado). Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on Automatic Control, AC-19: 716-723. Barron, A.R. (1984) Predicted squared error: a criterion for automatic model selection, in Self-organizing Methods in Modeling: GMDH Type Algorithms, edited by S.J. Farlow (New York: Marcel Dekker). Dasarathy, B.V. (1991) Nearest Neighbor Pattern Classification Techniques, Edited col-lection (California: IEEE Press). Draper, N.R. & Smith, H. (1981) Applied Regression Analysis, Second Edition (New York: John Wiley & Sons). Ein-Dor, P. & Feldmesser, J. (1987) Attributes of the performance of central processing units: a relative performance prediction model, Communications of the ACM, 30:308-317. Eubank, R.L. (1988) Spline Smoothing and Nonparametric Regression (New York: Marcel Dekker). Fraser, D.A.S. (1976) Probability and Statistics: Theory and Applications (Massachusetts: Duxbury Press). Geman, S., Bienenstock, E., & Doursat, R. (1992) Neural networks and the bias/variance dilemma, Neural Computation, 4:1-58. Kibler, D., Aha, D.W., & Albert, M.K. (1989) Instance-based prediction of real-valued attributes, Computational Intelligence, 5:51-57. Moody, J.E. (1991) Note on generalization, regularization, and architecture selection in nonlinear learning systems, First IEEE-SP Workshop on Neural Networks for Signal Processing (California: IEEE Press). Moody, J.E. (1992) The effective number of parameters: an analysis of generalization and regularization in nonlinear learning systems, in Advances in Neural Information Processing Systems 4, edited by J.E. Moody, S.J. Hanson, and R.P. Lippmann (Cali-fornia: Morgan Kaufmann). Sakamoto, Y., Ishiguro, M., & Kitagawa, G. (1986) Akaike Information Criterion Statis-tics (Dordrecht, Holland: Kluwer Academic Publishers). Strang, G. (1976) Linear Algebra and Its Applications (New York: Academic Press). Turney, P.D. (1990) The curve fitting problem: a solution, British Journal for the Philoso-phy of Science, 41:509-530. citation: Turney, Peter D. (1994) A theory of cross-validation error. [Journal (Paginated)] document_url: http://cogprints.org/1820/3/NRC-35072.pdf