---
abstract: |-
  An inductive learning algorithm takes a set of data as input and generates a hypothesis as
  output. A set of data is typically consistent with an infinite number of hypotheses;
  therefore, there must be factors other than the data that determine the output of the
  learning algorithm. In machine learning, these other factors are called the bias of the
  learner. Classical learning algorithms have a fixed bias, implicit in their design. Recently
  developed learning algorithms dynamically adjust their bias as they search for a
  hypothesis. Algorithms that shift bias in this manner are not as well understood as
  classical algorithms. In this paper, we show that the Baldwin effect has implications for
  the design and analysis of bias shifting algorithms. The Baldwin effect was proposed in
  1896, to explain how phenomena that might appear to require Lamarckian evolution
  (inheritance of acquired characteristics) can arise from purely Darwinian evolution.
  Hinton and Nowlan presented a computational model of the Baldwin effect in 1987. We
  explore a variation on their model, which we constructed explicitly to illustrate the lessons
  that the Baldwin effect has for research in bias shifting algorithms. The main lesson is that
  it appears that a good strategy for shift of bias in a learning algorithm is to begin with a
  weak bias and gradually shift to a strong bias.
altloc: []
chapter: ~
commentary: ~
commref: ~
confdates: ~
conference: ~
confloc: ~
contact_email: ~
creators_id: []
creators_name:
  - family: Turney
    given: Peter D.
    honourific: ''
    lineage: ''
date: 1996
date_type: published
datestamp: 2001-10-11
department: ~
dir: disk0/00/00/18/18
edit_lock_since: ~
edit_lock_until: ~
edit_lock_user: ~
editors_id: []
editors_name: []
eprint_status: archive
eprintid: 1818
fileinfo: /style/images/fileicons/application_postscript.png;/1818/1/Baldwin.ps|/style/images/fileicons/application_pdf.png;/1818/5/Baldwin.pdf
full_text_status: public
importid: ~
institution: ~
isbn: ~
ispublished: pub
issn: ~
item_issues_comment: []
item_issues_count: 0
item_issues_description: []
item_issues_id: []
item_issues_reported_by: []
item_issues_resolved_by: []
item_issues_status: []
item_issues_timestamp: []
item_issues_type: []
keywords: 'bias, instinct, bias shift, Baldwin effect, concept learning, induction.'
lastmod: 2011-03-11 08:54:48
latitude: ~
longitude: ~
metadata_visibility: show
note: ~
number: 3
pagerange: 271-295
pubdom: FALSE
publication: Evolutionary Computation
publisher: ~
refereed: TRUE
referencetext: |
  [1] Ackley, D., and Littman, M. (1991). Interactions between learning and evolu-tion.
  In Proceedings of the Second Conference on Artificial Life, C. Langton, C.
  Taylor, D. Farmer, and S. Rasmussen, editors. California: Addison-Wesley.
  [2] Anderson, R.W. (1995). Learning and evolution: A quantitative genetics
  approach. Journal of Theoretical Biology, 175, 89-101.
  [3] Bala, J., Huang, J., Vafaie, H., DeJong, K., and Wechsler, H. (1995). Hybrid
  learning using genetic algorithms and decision tress for pattern classification.
  Proceedings of the 14th International Joint Conference on Artificial Intelli-gence,
  IJCAI-95, Montreal, Canada, pp. 719-724.
  [4] Balakrishnan, K., and Honavar, V. (1995). Evolutionary design of neural archi-tectures:
  A preliminary taxonomy and guide to literature. Artificial Intelligence
  Research Group, Department of Computer Science, Iowa State University,
  Technical Report CS TR #95-01.
  [5] Baldwin, J.M. (1896). A new factor in evolution. American Naturalist, 30, 441-
  451.
  [6] Barkow, J.H., Cosmides, L., and Tooby, J. (1992). Editors, The Adapted Mind:
  Evolutionary Psychology and the Generation of Culture, New York: Oxford
  University Press.
  [7] Belew, R.K., and Mitchell, M. (1996). Editors, Adaptive Individuals in Evolving
  Populations: Models and Algorithms. Massachusetts: Addison-Wesley.
  [8] Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the
  bias/variance dilemma, Neural Computation, 4, 1-58.
  [9] Glover, F. (1989). Tabu search — part i. ORSA (Operations Research Society of
  America) Journal on Computing, 1, 190-260.
  [10] Glover, F. (1990). Tabu search — part ii. ORSA (Operations Research Society
  of America) Journal on Computing, 2, 4-32.
  [11] Gordon, D.F., and desJardins, M. (1995). Evaluation and selection of biases in
  machine learning. Machine Learning, 20, 5-22.
  [12] Grefenstette, J.J. (1983). A user’s guide to GENESIS. Technical Report
  CS-83-11, Computer Science Department, Vanderbilt University.
  [13] Grefenstette, J.J. (1986). Optimization of control parameters for genetic algo-rithms.
  IEEE Transactions on Systems, Man, and Cybernetics, 16, 122-128.
  [14] Harvey, I. (1993). The puzzle of the persistent question marks: A case study of
  genetic drift. In S. Forrest (editor) Proceedings of the Fifth International Con-ference
  on Genetic Algorithms, ICGA-93, California: Morgan Kaufmann.
  [15] Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and
  Valiant’s learning framework. Artificial Intelligence, 36, 177-221.
  [16] Hinton, G.E., and Nowlan, S.J. (1987). How learning can guide evolution.
  Complex Systems, 1, 495-502.
  [17] Hinton, G.E. (1986). Learning distributed representations of concepts. Proceed-ings
  of the Eighth Annual Conference of the Cognitive Science Society, 1-12,
  Hillsdale: Erlbaum.
  [18] Lawrence, D. (1987). Genetic Algorithms and Simulated Annealing. California:
  Morgan Kaufmann.
  [19] Maynard Smith, J. (1987). When learning guides evolution. Nature, 329, 761-
  762.
  [20] Morgan, C.L. (1896). On modification and variation. Science, 4, 733-740.
  [21] Nolfi, S., Elman, J., and Parisi, D. (1994). Learning and evolution in neural net-works.
  Adaptive Behavior, 3, 5-28.
  [22] Nowlan, S.J., and Hinton, G.E. (1992). Simplifying neural networks by soft
  weight-sharing. Neural Computation, 4, 473-493.
  [23] Osborn, H.F. (1896). Ontogenic and phylogenic variation. Science, 4, 786-789.
  [24] Pinker, S. (1994). The Language Instinct: How the Mind Creates Language.
  New York: William Morrow and Co.
  [25] Provost, F.J., and Buchanan, B.G. (1995). Inductive policy: The pragmatics of
  bias selection. Machine Learning, 20, 35-61.
  [26] Rendell, L. (1986). A general framework for induction and a study of selective
  induction. Machine Learning, 1, 177-226.
  [27] Schaffer, C. (1993). Selecting a classification method by cross-validation.
  Machine Learning, 13, 135-143.
  [28] Schaffer, C. (1994). A conservation law for generalization performance. Pro-ceedings
  of the Eleventh International Machine Learning Conference, ML-94.
  California: Morgan Kaufmann.
  [29] Tcheng, D., Lambert, B., Lu, S., Rendell, L. (1989). Building robust learning
  systems by combining induction and optimization. Proceedings of the Eleventh
  International Joint Conference on Artificial Intelligence, IJCAI-89, pp. 806-
  812. Detroit, Michigan.
  [30] Turney, P.D. (1995). Cost-sensitive classification: Empirical evaluation of a
  hybrid genetic decision tree induction algorithm. Journal for AI Research, 2,
  369-409.
  [31] Utgoff, P., and Mitchell, T. (1982). Acquisition of appropriate bias for inductive
  concept learning. Proceedings of the National Conference on Artificial Intelli-gence,
  AAAI-82, Pittsburgh, pp. 414-417.
  [32] Utgoff, P. (1986). Shift of bias for inductive concept learning. In Machine
  Learning: An Artificial Intelligence Approach, Volume II. Edited by R.S.
  Michalski, J.G. Carbonell, and T.M. Mitchell. California: Morgan Kaufmann.
  [33] Waddington, C.H. (1942). Canalization of development and the inheritance of
  acquired characters. Nature, 150, 563-565.
  [34] Wcislo, W.T. (1989). Behavioral environments and evolutionary change.
  Annual Review of Ecology and Systematics, 20, 137-169.
  [35] Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. (1990). Predicting the
  future: A connectionist approach. In T.J. Sejnowski, G.E. Hinton, and D.S.
  Touretzky, editors, Proceedings of the 1990 Connectionist Models Summer
  School, San Mateo, CA, Morgan Kaufmann.
  [36] Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. (1991). Generalization
  by weight elimination with application to forecasting. In R.P. Lippman, J.E.
  Moody, and D.S. Touretzky, editors, Advances in Neural Information Process-ing
  Systems 3 (NIPS 3), pp. 875-882. San Mateo, CA, Morgan Kauffman.
  [37] Weigend, A.S., and Rumelhart, D.E. (1994). Weight-elimination and effective
  network size. In S.J. Hanson, G.A. Drastal, and R.L. Rivest, editors, Computa-tional
  Learning Theory and Natural Learning Systems, pp 457-476. Cambridge,
  MA: MIT Press.
  [38] Whitley, D., and Gruau, F. (1993). Adding learning to the cellular development
  of neural networks: Evolution and the Baldwin effect. Evolutionary Computa-tion,
  1, 213-233.
  [39] Whitley, D., Gordon, S., and Mathias, K. (1994). Lamarckian evolution, the
  Baldwin effect and function optimization. Parallel Problem Solving from
  Nature — PPSN III. Y. Davidor, H.P. Schwefel, and R. Manner, editors, pp. 6-
  15. Berlin: Springer-Verlag.
  [40] Wolpert, D. (1992). On the connection between in-sample testing and generali-zation
  error. Complex Systems, 6, 47-94.
  [41] Wolpert, D. (1994). Off-training set error and a priori distinctions between
  learning algorithms. Technical Report SFI-TR-95-01-003, Santa Fe Institute.
relation_type: []
relation_uri: []
reportno: ~
rev_number: 14
series: ~
source: ~
status_changed: 2007-09-12 16:41:02
subjects:
  - bio-evo
  - comp-sci-mach-learn
  - comp-sci-stat-model
succeeds: ~
suggestions: ~
sword_depositor: ~
sword_slug: ~
thesistype: ~
title: 'How to shift bias: Lessons from the Baldwin effect'
type: journalp
userid: 2175
volume: 4
<script src='https://archive-bar.soton.ac.uk/archive-bar.js'></script>
<script src='https://archive-bar.soton.ac.uk/google-analytics.js'></script>