---
abstract: |-
An inductive learning algorithm takes a set of data as input and generates a hypothesis as
output. A set of data is typically consistent with an infinite number of hypotheses;
therefore, there must be factors other than the data that determine the output of the
learning algorithm. In machine learning, these other factors are called the bias of the
learner. Classical learning algorithms have a fixed bias, implicit in their design. Recently
developed learning algorithms dynamically adjust their bias as they search for a
hypothesis. Algorithms that shift bias in this manner are not as well understood as
classical algorithms. In this paper, we show that the Baldwin effect has implications for
the design and analysis of bias shifting algorithms. The Baldwin effect was proposed in
1896, to explain how phenomena that might appear to require Lamarckian evolution
(inheritance of acquired characteristics) can arise from purely Darwinian evolution.
Hinton and Nowlan presented a computational model of the Baldwin effect in 1987. We
explore a variation on their model, which we constructed explicitly to illustrate the lessons
that the Baldwin effect has for research in bias shifting algorithms. The main lesson is that
it appears that a good strategy for shift of bias in a learning algorithm is to begin with a
weak bias and gradually shift to a strong bias.
altloc: []
chapter: ~
commentary: ~
commref: ~
confdates: ~
conference: ~
confloc: ~
contact_email: ~
creators_id: []
creators_name:
- family: Turney
given: Peter D.
honourific: ''
lineage: ''
date: 1996
date_type: published
datestamp: 2001-10-11
department: ~
dir: disk0/00/00/18/18
edit_lock_since: ~
edit_lock_until: ~
edit_lock_user: ~
editors_id: []
editors_name: []
eprint_status: archive
eprintid: 1818
fileinfo: /style/images/fileicons/application_postscript.png;/1818/1/Baldwin.ps|/style/images/fileicons/application_pdf.png;/1818/5/Baldwin.pdf
full_text_status: public
importid: ~
institution: ~
isbn: ~
ispublished: pub
issn: ~
item_issues_comment: []
item_issues_count: 0
item_issues_description: []
item_issues_id: []
item_issues_reported_by: []
item_issues_resolved_by: []
item_issues_status: []
item_issues_timestamp: []
item_issues_type: []
keywords: 'bias, instinct, bias shift, Baldwin effect, concept learning, induction.'
lastmod: 2011-03-11 08:54:48
latitude: ~
longitude: ~
metadata_visibility: show
note: ~
number: 3
pagerange: 271-295
pubdom: FALSE
publication: Evolutionary Computation
publisher: ~
refereed: TRUE
referencetext: |
[1] Ackley, D., and Littman, M. (1991). Interactions between learning and evolu-tion.
In Proceedings of the Second Conference on Artificial Life, C. Langton, C.
Taylor, D. Farmer, and S. Rasmussen, editors. California: Addison-Wesley.
[2] Anderson, R.W. (1995). Learning and evolution: A quantitative genetics
approach. Journal of Theoretical Biology, 175, 89-101.
[3] Bala, J., Huang, J., Vafaie, H., DeJong, K., and Wechsler, H. (1995). Hybrid
learning using genetic algorithms and decision tress for pattern classification.
Proceedings of the 14th International Joint Conference on Artificial Intelli-gence,
IJCAI-95, Montreal, Canada, pp. 719-724.
[4] Balakrishnan, K., and Honavar, V. (1995). Evolutionary design of neural archi-tectures:
A preliminary taxonomy and guide to literature. Artificial Intelligence
Research Group, Department of Computer Science, Iowa State University,
Technical Report CS TR #95-01.
[5] Baldwin, J.M. (1896). A new factor in evolution. American Naturalist, 30, 441-
451.
[6] Barkow, J.H., Cosmides, L., and Tooby, J. (1992). Editors, The Adapted Mind:
Evolutionary Psychology and the Generation of Culture, New York: Oxford
University Press.
[7] Belew, R.K., and Mitchell, M. (1996). Editors, Adaptive Individuals in Evolving
Populations: Models and Algorithms. Massachusetts: Addison-Wesley.
[8] Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the
bias/variance dilemma, Neural Computation, 4, 1-58.
[9] Glover, F. (1989). Tabu search — part i. ORSA (Operations Research Society of
America) Journal on Computing, 1, 190-260.
[10] Glover, F. (1990). Tabu search — part ii. ORSA (Operations Research Society
of America) Journal on Computing, 2, 4-32.
[11] Gordon, D.F., and desJardins, M. (1995). Evaluation and selection of biases in
machine learning. Machine Learning, 20, 5-22.
[12] Grefenstette, J.J. (1983). A user’s guide to GENESIS. Technical Report
CS-83-11, Computer Science Department, Vanderbilt University.
[13] Grefenstette, J.J. (1986). Optimization of control parameters for genetic algo-rithms.
IEEE Transactions on Systems, Man, and Cybernetics, 16, 122-128.
[14] Harvey, I. (1993). The puzzle of the persistent question marks: A case study of
genetic drift. In S. Forrest (editor) Proceedings of the Fifth International Con-ference
on Genetic Algorithms, ICGA-93, California: Morgan Kaufmann.
[15] Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and
Valiant’s learning framework. Artificial Intelligence, 36, 177-221.
[16] Hinton, G.E., and Nowlan, S.J. (1987). How learning can guide evolution.
Complex Systems, 1, 495-502.
[17] Hinton, G.E. (1986). Learning distributed representations of concepts. Proceed-ings
of the Eighth Annual Conference of the Cognitive Science Society, 1-12,
Hillsdale: Erlbaum.
[18] Lawrence, D. (1987). Genetic Algorithms and Simulated Annealing. California:
Morgan Kaufmann.
[19] Maynard Smith, J. (1987). When learning guides evolution. Nature, 329, 761-
762.
[20] Morgan, C.L. (1896). On modification and variation. Science, 4, 733-740.
[21] Nolfi, S., Elman, J., and Parisi, D. (1994). Learning and evolution in neural net-works.
Adaptive Behavior, 3, 5-28.
[22] Nowlan, S.J., and Hinton, G.E. (1992). Simplifying neural networks by soft
weight-sharing. Neural Computation, 4, 473-493.
[23] Osborn, H.F. (1896). Ontogenic and phylogenic variation. Science, 4, 786-789.
[24] Pinker, S. (1994). The Language Instinct: How the Mind Creates Language.
New York: William Morrow and Co.
[25] Provost, F.J., and Buchanan, B.G. (1995). Inductive policy: The pragmatics of
bias selection. Machine Learning, 20, 35-61.
[26] Rendell, L. (1986). A general framework for induction and a study of selective
induction. Machine Learning, 1, 177-226.
[27] Schaffer, C. (1993). Selecting a classification method by cross-validation.
Machine Learning, 13, 135-143.
[28] Schaffer, C. (1994). A conservation law for generalization performance. Pro-ceedings
of the Eleventh International Machine Learning Conference, ML-94.
California: Morgan Kaufmann.
[29] Tcheng, D., Lambert, B., Lu, S., Rendell, L. (1989). Building robust learning
systems by combining induction and optimization. Proceedings of the Eleventh
International Joint Conference on Artificial Intelligence, IJCAI-89, pp. 806-
812. Detroit, Michigan.
[30] Turney, P.D. (1995). Cost-sensitive classification: Empirical evaluation of a
hybrid genetic decision tree induction algorithm. Journal for AI Research, 2,
369-409.
[31] Utgoff, P., and Mitchell, T. (1982). Acquisition of appropriate bias for inductive
concept learning. Proceedings of the National Conference on Artificial Intelli-gence,
AAAI-82, Pittsburgh, pp. 414-417.
[32] Utgoff, P. (1986). Shift of bias for inductive concept learning. In Machine
Learning: An Artificial Intelligence Approach, Volume II. Edited by R.S.
Michalski, J.G. Carbonell, and T.M. Mitchell. California: Morgan Kaufmann.
[33] Waddington, C.H. (1942). Canalization of development and the inheritance of
acquired characters. Nature, 150, 563-565.
[34] Wcislo, W.T. (1989). Behavioral environments and evolutionary change.
Annual Review of Ecology and Systematics, 20, 137-169.
[35] Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. (1990). Predicting the
future: A connectionist approach. In T.J. Sejnowski, G.E. Hinton, and D.S.
Touretzky, editors, Proceedings of the 1990 Connectionist Models Summer
School, San Mateo, CA, Morgan Kaufmann.
[36] Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. (1991). Generalization
by weight elimination with application to forecasting. In R.P. Lippman, J.E.
Moody, and D.S. Touretzky, editors, Advances in Neural Information Process-ing
Systems 3 (NIPS 3), pp. 875-882. San Mateo, CA, Morgan Kauffman.
[37] Weigend, A.S., and Rumelhart, D.E. (1994). Weight-elimination and effective
network size. In S.J. Hanson, G.A. Drastal, and R.L. Rivest, editors, Computa-tional
Learning Theory and Natural Learning Systems, pp 457-476. Cambridge,
MA: MIT Press.
[38] Whitley, D., and Gruau, F. (1993). Adding learning to the cellular development
of neural networks: Evolution and the Baldwin effect. Evolutionary Computa-tion,
1, 213-233.
[39] Whitley, D., Gordon, S., and Mathias, K. (1994). Lamarckian evolution, the
Baldwin effect and function optimization. Parallel Problem Solving from
Nature — PPSN III. Y. Davidor, H.P. Schwefel, and R. Manner, editors, pp. 6-
15. Berlin: Springer-Verlag.
[40] Wolpert, D. (1992). On the connection between in-sample testing and generali-zation
error. Complex Systems, 6, 47-94.
[41] Wolpert, D. (1994). Off-training set error and a priori distinctions between
learning algorithms. Technical Report SFI-TR-95-01-003, Santa Fe Institute.
relation_type: []
relation_uri: []
reportno: ~
rev_number: 14
series: ~
source: ~
status_changed: 2007-09-12 16:41:02
subjects:
- bio-evo
- comp-sci-mach-learn
- comp-sci-stat-model
succeeds: ~
suggestions: ~
sword_depositor: ~
sword_slug: ~
thesistype: ~
title: 'How to shift bias: Lessons from the Baldwin effect'
type: journalp
userid: 2175
volume: 4