In 1982, Feldman and Ballard published "Connectionist models and their properties" in Cognitive Science, helping to focus attention on a family of similarly inspired research strategies just then under way, by giving the family a name: "connectionism." Now, seven years later, the connectionist nation has swelled to include such subfamilies as "PDP" and "neural net models." Since the ideological foes of connectionism are keen to wipe it out in one fell swoop aimed at its "essence", it is worth noting the diversity of not only the models but also the aspirations of the modelers. There is no good reason to suppose that they all pledge allegiance to any one principle or set of principles that could be shown to be false or incoherent. Those who think they are making exciting progress on their chosen projects have no particular reason to declare their own brand orthodox, and other brands ("rival" brands) heretical. Let a thousand flowers bloom and soon enough the sickly plants will succumb, without need of ideological condemnation.
But the ideological imperative will not be suppressed. Even the
most myopic and philistine model-builder wants to know just where his or
her own efforts fit into the grand scheme of things, and grand schemes
do tend to organize themselves into schools and movements, sects and heresies,
revolutions and reformations. If one wants to get one's bearings, one has
to indulge in ideology. My own reluctant entrance into the ideological
campaigns surrounding connectionism was commissioned by the organizers
of a conference sponsored by the Sloan Foundation on the Foundations of
Cognitive Science. I was asked to speak about "computational approaches
to psychology." The year was 1984, and the place was MIT, so I arranged
my survey around a somewhat Orwellian cartographic proposal: that the various
alternatives could be organized into a polar coordinate map centered on
the "East Pole", as MIT had been dubbed by Jerry Fodor. My message was
that there were a variety of "diametrically opposed" ways of not
residing at the East Pole, and one way or another, Bishop Berkeley was
right: "Westward the course of empire takes its way." (How many people
know that Berkeley, California, was named for George Berkeley, in honor
of this remark?)
That talk was published in truncated form in the Times Literary Supplement (Dennett, 1984a), and when the complete version appeared two years later (Dennett, 1986), it was accompanied, like a loaf of bread, by a warning and an expiry date: "Written under a deadline for the purpose of providing a glimpse of the state of the art in mid-1984, it will no doubt have a short shelf life. So read it now, or if now is later than 1986, read it as a quaint reflection on how some people thought back in 1984."
Re-reading it today, after reviewing a goodly portion of the introductions, surveys, debates, special issues, and other commentaries on connectionism that have inundated us, I find that I underestimated its shelf life, but since most of the points I was eager to make in my effort to explain and defend the appeal of connectionism have been ably hammered into familiarity by other hands, I will not repeat them. I am persuaded, however, to reiterate a few points that seem to me to be not only true and important, but insufficiently appreciated by subsequent commentators.
First, while it is widely felt (a less squishy word would not be
appropriate) that connectionist approaches are somehow much more biological
than what Fodor and Pylyshyn (1988) have hastened to dub "classical" approaches
(and what I call High Church Computationalist approaches), one of the deepest
of the grounds for biological skepticism about HCC models has still not
be properly acknowledged or digested.
Douglas Hofstadter (1983) has recently found a way of expressing this misgiving that strikes me as being on the right track. HCC systems, designed as they are "through a 100% top-down approach" (p.284) are too efficient in their utilization of machinery. As we work our way down through the nested black boxes, "functions calling subfunctions calling subfunctions," decomposing larger homunculi into committees of smaller, dumber homunculi, we provide for no waste motion, no nonfunctional or dysfunctional clutter, no featherbedding homunculi or supernumeraries. But that is not Nature's Way; designing systems or organizations with that sort of efficiency requires genuine foresight, a detailed anticipation of the problem spaces to be encountered, the tasks the system will be called upon to perform. Another way of saying it is that such systems, by being designed all the way down, have too much intelligence implicated in their design at the lower levels.
Nature's Way of providing flexibility and good design involves
a different kind of efficiency, the sort of efficiency that can emerge
opportunistically out of prodigious amounts of "wasteful" and locally uninterpretable
activity--activity that isn't from the outset "for" anything, but is enlisted
to play some very modest role (or many roles on many different occasions)
in some highly distributed process. (Dennett, 1986, p.66-67)
A little amplification is in order. I myself, of course, have championed the pyramid-of-homunculi vision of functional decomposition (Dennett, 1974, 1978), and still think it has many virtues, but what Hofstadter pointed out was that there are very different ways to get homunculi to do a larger job: at one extreme is the rigid, bureaucratic, top-down design, in which job-slots are identified, parsimoniously, and then filled, Provost to Dean to Department Chair to Professor to Teaching Assistant, and at the other extreme is the opportunistic bottom-up design, which tries to press whoever shows up at the party--the more the merrier--into whatever roles seem (to the myopic organizers) to be worth doing. Since we already have ample grounds for discerning the latter design "strategy" in the process of natural selection, it should not surprise us if we see the same principle at work when we look at the bootstrapped processes which design phenotypically plastic elements (learning, pre-eminently, but also other environmentally sensitive developmental processes). That is not to say that it is a priori impossible that natural selection could have so precisely "appreciated" the detailed features of our cognitive problem-spaces that it designed us with top-down "efficiency," but if so, this would be a striking exception to the rule. Moreover, since in Nature, time (including "r and d" time) is a vastly more costly commodity than material, we should expect designs in which many elements are thrown into the breech, only some of which actually play a significant role (and hence are univocally interpretable).
This promiscuous mingling of interpretable and uninterpretable
("excess" or apparently non-functional) elements is thus given a biological
warrant, which properly counterbalances the functionalists' tendency to
demand more function (more "adaptation"--see Dennett, 1983) of the discernible
elements of any system than is warranted. And this important feature shows
up clearly in connectionist systems, where, at the computational level,
no distinction is made between symbols and nonsymbols. All are treated
exactly alike at that level. The computational mechanism "doesn't have
to know" which ones are the symbols. They are all the same. Some of them
we (looking at a higher level) can see take on a role rather like
symbols, but this is not a feature that makes a difference at the computational
level. That is a nice property. It's entirely contrary to HCC or "classical"
systems, where the distinction between symbol and non-symbol is "principled"
and makes all the computational difference in the world.
This difference comes out clearly if you consider the hidden units in a (non-local) connectionist network. If you subject those hidden units to careful statistical analysis, you can discover that a certain node is always ON whenever the subject is dogs, let us say, and never ON--or ON very weakly--when the subject is cats, whereas another node is ON for cats and not for dogs. Other nodes, however, seem to have no interpretation at all. They have no semantics; they're just there. As far as semantics is concerned, they're just noise; sometimes they are strongly active and other times weakly, but these times don't seem to match up with any category of interest. As many skeptics about connectionism have urged, the former sorts of nodes are plausibly labeled the DOG node and the CAT node and so forth, and so it is tempting to say that we have symbols after all. Connectionism turns out to be just a disguised version of good old-fashioned symbol-manipulating AI! Plausible as this is (and there must be some truth to the idea that certain nodes should be viewed as semantic specialists) there is another fact about such networks that undercuts the skeptics' claim in a most interesting way. The best reason for not calling the dog-active node the dog symbol is that you can "kill" or disable that node and the system will go right on discriminating dogs, remembering about dogs, etc, with at most a slight degradation in performance. It turns out, in other words, that at least some of those other "noisy" nodes were carrying some of the load. What is more, if you keep the "symbol" nodes alive and kill the other, merely noisy nodes, the system doesn't work.
A related point, often well made by Smolensky (e.g., Smolensky, 1988), but often misunderstood (e.g., by various commentators on Smolensky, 1988), is that
The "virtual machine" that is recognizably psychological in its activity will not be a machine in the sense that its behavior is not formally specifiable (using the psychological-level vocabulary) as the computation of some high-level algorithm. Thus in this vision the low, computational level is importantly unlike a normal machine language in that there is no supposition of a direct translation or implementation relation between the high-level phenomena that do have an external-world semantics and the phenomena at the low level. If there were, the usual methodological precept of computer science would be in order: ignore the hardware since the idiosyncracies of its particular style of implementation add nothing to the phenomenon, provided the phenomenon is rigorously described at the highest level. Implementation details do add constraints of time and space, of course, which are critical to the assessment of particular models, but these details are not normally supposed to affect what information processing is executed, which is just what makes [connectionism] a break with tradition. (Dennett, 1986, p.69-70).
The well-known distinction (in philosophy) between rule-following behavior and rule-described behavior is often illustrated by pointing out that the planets do not compute their orbits, even though we can, following rules that describe their motions. The "rules" of planetary motion are law-like regularities, not "followed" rules. This is true, but it ignores a variety of regularity intermediate between the regularities of planets (or ordinary cloud formations) and the regularities of rule-following (that is, rule-consulting) systems. These are the regularities that are preserved under selection pressure: the regularities dictated by principles of good design and hence homed in on by self-designing systems. That is, a "rule of thought" may be much more than a mere regularity; it may be a wise rule, a rule one would design a system by if one were a system designer, and hence a rule one would expect self-designing systems to "discover" in the course of settling into their patterns of activity. Such rules no more need be explicitly represented than do the principles of aerodynamics honored in the design of birds' wings. (Dennett, 1986, p. 73-4)These are (still) my grounds for enthusiasm for connectionism, and in the interim I have come to see connectionism as just one species of a larger genus of model-exploration that promises (or threatens, depending on your point of view) to supplant "classical" rule-based system-building. Among the many virtues of the recent anthology, Artificial Life, (Langton, 1989) is that it provides glimpses of still further alternatives to HCC, which only a few years ago was "the only straw floating." One of the morals driven home by various contributions to that volume is drawn explicitly by the editor in the introduction: top-down specifications of all types of biological system must depend on a system of "global rules" which "predict" the effects on global structure of many local non-linear interactions--something which we have seen is intractable, even impossible, in the general case. . . .
Furthermore, in a system of any complexity the number of possible global states is astronomically enormous, and grows exponentially with the size of the system. Systems that attempt to supply global rules for global behavior simply cannot provide a different rule for every global state. Thus, the global states must be classified in some manner; categorized using a coarse-grained scheme according to which the global states within a category are indistinguishable. The rules of the system can only be applied at the level of resolution of these categories. There are many possible ways to implement a classification scheme, most of which will yield different partitionings of the global state-space. Any rule based system must necessarily assume that finer-grained differences don't matter, or must include a finite set of tests for "special cases," and then must assume that no other special cases are relevant. (Langton, p. 42)It is worth noting, then, that other, still undreamt of, computational architectures could well share the virtues of existing connectionist models, so while I am deeply grateful to the pioneers for opening up these vistas, I have not yet plighted my troth to any particular school of connectionism. My grounds for caution have been expressed several times, and more or less repeat the standard list: e.g., worries about scaling up, about "getting rid of the teacher", about replacing the all-too-sentential input and output nodes with something more plausible. (See Dennett, 1987, pp231-2, and Dennett, forthcoming a, from which several comments here are drawn. I don't have any particularly novel worries to add to the standard list, but I do have a remark to make about the much-belabored claim that some peculiarly human cognitive competencies are (in principle? so far?) beyond the powers of (any?) connectionist architecture that does not somehow slavishly forego its idiosyncratic specialties and emulate a "classical" system. Or as it is often put, a connectionist architecture can accomplish these special cognitive feats only by being a "mere implementation" of a "classical" symbol-manipulating architecture. See, e.g., Smolensky 1988, Fodor and Pylyshyn, 1988, Pinker and Prince, 1988. As many have noted, language is primarily what makes human psychology different from the psychology of other animals, and not surprisingly it is the language-enhanced (or at any rate language-infected) varieties of cognition that seem to raise the greatest difficulties for connectionism. But, if connectionist models were the right sort of fundamental models for basic, animal belief, this is just what one should expect, for language is a recent technology, grafted onto much older cognitive systems, and giving rise to a host of new cognitive phenomena, both more powerful and more risky/expensive than the underlying phenomena they may enhance but surely don't replace. The fact that we talk has the effect of adding new varieties of entities to our environment: since we talk, and write, we have all these sentences lying around--our own and other people's. We hear them, we remember them, we write them down, we speak them ourselves, and with regard to any such sentence in our language that we encounter or create, we have a problem: what to do with it. You can discard it, forget it, or you can decide to put it in the pile labeled TRUE or the pile labeled FALSE. And this, I claim, creates a rather different sort of specialized state, what in Brainstorms (Dennett, 1978) I called opinions. These are not just beliefs; these are linguistically infected states; only language users have them. Opinions are essentially bets on the truth of sentences in a language that you understand. My hunch is that a proper cognitive psychology is going to have to make a sharp distinction between beliefs and opinions, that the psychology of opinions is really going to be rather different from the psychology of beliefs, and that the sorts of architecture that will do very well by, say, non-linguistic perceptual beliefs (you might say animal beliefs) is going to have to be supplemented rather substantially in order to handle opinions. Who disagrees? Apparently Fodor and McLaughlin (forthcoming) do, for they make the following claim.
You don't find organisms that can think the thought that the girl loves John but can't think the thought that John loves the girl. You don't find organisms that can infer P from P&Q&R but can't infer P from P&Q. . . . For the purposes of this paper, we assume without argument:
i. that cognitive capacities are generally systematic in this sense, both in humans and in many [?] infrahuman organisms;
ii. that it is nomologically necessary (nonaccidental and hence counterfactual suporting) that this is so;
iii. that there must therefore be some psychological mechanism in virtue of the function of which cognitive capacities are systematic;
iv. and that an adequate theory of cognitive architecture should exhibit this mechanism. (Fodor and McLaughlin, forthcoming, p. 2).They acknowledge that these assumptions may be "tendentious"; I think they are pretty obviously false. You do find organisms--vervet monkeys, for instance--that fail "inference" tests so strangely that while they do not quite pass muster as capable of thinking the thought (having the opinion) that the girl loves John, they give evidence of believing (in that animal sort of way) that the girl loves John, and if that is the sort of state they are in, it's even money whether they are capable as well of being in the state of believing (in the same animal way) that John loves the girl (Cheney and Seyfarth, forthcoming). There are organims of whom one would say with little hesitation that they think a lion wants to eat them, but where there is no reason at all to think they could "frame the thought" that they want to eat the lion! The sort of systematicity that Fodor and McLaughlin draw our attention to is in fact a pre-eminently language-based artifact, not anything one should expect to discover governing the operations in the machine room of cognition. So there is nothing ad hoc or unmotivated about the acknowledgment that some areas of human cognition require a higher-level "symbolic" virtual architecture; after all, language, arithmetic, logic, writing, map-making--these are all brilliant inventions that dramatically multiply our capacities for cognition, and it should come as no surprise if they invoke design principles unanticipated in the cognitive systems of other animals. They are, in the laudatory sense, cognitive wheels (Dennett, 1984b). The obvious objection to this line of thought is that what Pinker and Prince (1988) and others have shown is that language cannot be acquired without the aid of pre-existing symbolic architectures; hence even if some part of mature human cognitive competence is due to a virtual machine that depends on having language intact, we still have to explain how that state of linguistic competence is reachable, and connectionism cannot (in principle?) yield that competence. But in fact they have shown nothing of the kind; at the most they show that one cannot complete the journey to a mature linguistic competence without availing oneself, somewhere prior to mastery, of elements of symbolic architecture, which might, in fact, be simultaneously a byproduct of an immature linguistic competence as a necessary condition for advancing to a mature competence. Bootstrapping of this sort has plenty of precedents, so we still have no good reason for believing that an HCC architecture is a precondition for language acquisition. We must tread carefully here; we must not repeat the mistake recounted by Dawkins (1986):
Donald Griffin tells a story of what happened when he and his colleague Robert Galambos first reported to an astonished conference of zoologists in 1940 their new discovery of the facts of bat echolocation. One distinguished scientist was so indignantly incredulous that he seized Galambos by the shoulders and shook him while complaining that we could not possibly mean such an outrageous suggestion. Radar and sonar were still highly classified developments in military technology, and the notion that bats might do anything even remotely analogous to the latest triumphs of electronic engineering struck most people as not only implausible but emotionally repugnant. (p.35)It is not entirely out of the question that the lowly bat has the benefits of not only sophisticated sonar, but also of predicate calculus, quantified modal logic or situation semantics. But at the same time we should bear in mind the demonstrable ecological need that is served by bat sonar, and ask ourselves: what on earth would bats use predicate calculus or quantified modal logic for? What problems do they face that would be usefully tackled with such tools? The answer suggested by the "classical" HCC approach is that only such tools (with suitable allowances for species-specific limitations or adjustments, presumably) can do the versatile, generative representing that even bats cannot live without. A dubious claim, when brought into the open. Even the mammals nearest to us in biological space, apes and monkeys, show striking limitations in the versatility and (apparent) generativity of their cognitive competencies (Premack, 1986, Cheney and Seyfarth, forthcoming), so it is more than somewhat probable that the fine-grainedness and apparently indefinite extensibility of our mental states is a recent add-on, an enhancement that depends on our species having discovered tricks that turn our brains into rather different virtual machines (Dennett, forthcoming b). While trying to describe the architecture of that virtual machine, we might do well to try to base it on a more plausibly mammalian cognitive architecture, one that can bear the cognitive loads of a day in the life of a hungry chimpanzee, for instance, whether or not it is up to all the subtle misapprehensions and musings of a (shortest) spycatcher or a Francophone visitor to London (Kripke, 1979). Perhaps contrary to the expectations of the editors of this volume, I have declined to take a firm stand about the ultimate ideological or theoretical significance of connectionism. Might it turn out to be a seductive blind alley? Yes. Might it be the beginning of a revolution in the study of the mind? Yes. Let's find out which, by getting on with the building and testing of the models. For the fact is that connectionist models actually do surprising things, and if they didn't, they would not have sustained enough interest to warrant this volume. If it weren't for the actual computer implementations of connectionist systems, we would have, perhaps, the fervently expressed beliefs of a few visionaries who were sure that somehow, such an architecture could perform wonders, but visionaries with persuasive ideologies are a dime a dozen, and debates between visionaries are what lawyers rudely call pissing contests. Connectionism has dramatically shifted the mainstream of opinion in cognitive science, but only because the existing implementations actually perform non-trivial tasks in ways unanticipated by received opinion. If those of the "classical" school want to resume their hegemony, they will need more than a persuasive ideology--they have had that all along, but it has grown threadbare with reiteration--they will need some positive results of actual modeling as striking as those the connectionists have used to attract our attention.