Pinker & Prince: Connectionism

From: Hosier Adam (
Date: Fri Apr 20 2001 - 12:46:49 BST

Adam Hosier <>

CM302 Assignment 2 PINKER1

Pinker, Steven; Prince, Alan: On Language and Connectionism (1987)

Pinker & Prince, (P&P), are both experts in the psychology of language.
They created this paper in response to another paper published by
Rumelhart & McClelland 1987, (R&M). The R&M paper basically tried to
draw certain analogies between the way neural networks function and how
human cognition might work. (The details of what R&M did are given later
in this assignment.)

The P&P paper is over 40,000 words in size and goes into great detail
about specific language features of English; I will not be analyzing
these parts of the text as I doubt I have any where near the knowledge
of language theory to be even slightly critical. However I will explain
the very basic language theories where necessary, in an attempt to
break down some of the reasons why I think the P&P paper is too critical
of the work performed by Rumelhart and McClelland.

>The study of language is notoriously contentious, but until recently,
>researchers who could agree on little else have all agreed on one thing:
>that linguistic knowledge is couched in the form of rules and principles.
>This conception is consistent with -- indeed, is one of the prime
>motivations for -- the "central dogma" of modern cognitive science, namely
>that intelligence is the result of processing symbolic expressions.

The lines above hint at the basic criticism P&P try to raise. Namely
P&P do not believe that a network of neuron like processors which has
no internal 'rules' or 'representations' can even slightly represent
the way in which human cognition works. (I will explain this in more
detail further on in this assignment.)

P&P use the following words to describe the Parallel Distributed
Processing system used by R&M. Today this kind of system is more
commonly referred to, as a two layer feed-forward linear neural

>In these models, collectively referred to as "Parallel Distributed
>Processing" ("PDP") or "Connectionist" models, the hardware mechanisms are
>networks consisting of large numbers of densely interconnected units, which
>correspond to concepts.

P&P later go on to acknowledge that these networks are loosely based
on the way human brains are thought to work at the micro level,
although they make the disclaimer that these kind of networks are not
trained and do not learn in the same way as the brain. From a
layman's perspective I agree that NN do not train/ learn in quite the
same way as humans, however I know of no other system that comes as

>In some respects, these models are thought to resemble neural networks in
>meaningful ways; in others, most notably the teaching and learning
>mechanisms, there is no known neurophysiological analogue, and some authors
>are completely agnostic about how the units and connections are neurally

The fundamental concept which P&P want to undo is shown below.

>A possibility is that once PDP network models are fully developed, they
>will replace symbol-processing models as explanations of cognitive

As ever, even P&P acknowledge the AI Holy Grail of achievement - to
create a language capable system. Although P&P never actually mention
the Turing test, (specifically the letter answering system
indistinguishability test), they seem to encounter several areas that
directly apply to it. I will highlight these later in this assignment.

>Many observers thus feel that connectionism, as a radical restructuring of
>cognitive theory, will stand or fall depending on its ability to account
>for human language.

P&P describe the R&M system below. In outline R&M created a neural
network which could learn the 'past tense' for verbs in the English
language. For instance given the present tense verb stem 'stand' the
system would respond 'stood' as the past tense. The system was
trained with over 500 strong and weak verbs. (Strong verbs having
peculiar transformations such as 'go->went' which don't follow simple
patterns and weak verbs having regular transformations such as
'jump->jumped', 'jog->jogged').

>One of the most influential efforts in the PDP school has been a model of
>the acquisition of the marking of the past tense in English developed by
>David Rumelhart and James McClelland. It handles both regular (walk/walked
>and irregular (feel/felt) verbs, productively yielding past forms for novel
>verbs not in its training set, and it distinguishes the variants of the
>past tense morpheme (t versus d versus @o[i-d]) conditioned by the final
>consonant of the verb (walked versus jogged versus sweated). Furthermore,
>in doing so it displays a number of behaviors reminiscent of children.

As you can see P&P seem to grudgingly acknowledge that the system does
seem to learn in a similar way to how children learn. R&M actually
describe the main feature of their system as shown below.

>>We suggest instead that implicit knowledge of language may be stored in
>>connections among simple processing units organized into networks. While
>>the behavior of such networks may be describable (at least approximately)
>>as conforming to some system of rules, we suggest that an account of the
>>fine structure of the phenomena of language use and language acquisition
>>can best be formulated in models that make reference to the
>>characteristics of the underlying networks.

It is the fact that R&M claim their system, (using networks), gives a
better insight into human cognition than a study of rules and how they
are formed, that P&P dispute.

The following passages highlight what P&P disagree with and also one
of the main reasons why they disagree with R&M. It is interesting to
note that although P&P are specifically trying to discredit the R&M
work they also seem to disagree with the entire connectionist ideology.
(See end of next passage). However it must also be noted that their
claim is helped by the fact that the R&M paper was not self-critical

>We will conclude that the claim that parallel distributed processing
>networks can eliminate the need for rules and for rule induction mechanisms
>in the explanation of human language is unwarranted. In particular, we
>argue that the shortcomings are in many cases due to central features of
>connectionist ideology.

>There is no unpacking of its underlying theoretical assumptions so as to
>contrast them with those of a symbolic rule-based alternative, or indeed
>any alternative. As a result, there is no apportioning of credit or blame
>for the model's performance to properties that are essential.

One of the rules that P&P would state for formal language structure is
shown below. This particular rule is the opposite of 'blending', which
will be discussed later.

>The effect is that when a general rule (like Past(x) = x + 'ed') formally
>overlaps a specific rule (like Past(go) = went), the specific rule not only
>applies but also blocks the general one from applying.

P&P again mention the lack of internal data representation in neural
networks and also the seeming lack of 'rules' that a neural network

>These mappings are superimposed in the connection weights and node
>thresholds; no single parameter corresponds uniquely to a rule.

Finally P&P break down the actual R&M language model used in the
system and suggest that this is not correct. They actually go on
to suggest a better model that could have been used by R&M to give
better perhaps more realistic results.

>A better representation would have units referring in some way to phonetic
>features rather than to phonemes, because of the well-known fact that the
>correct dimension of generalization from old to new forms must be in terms
>of such features.

P&P do acknowledge that the R&M model was simply that - a model.
Thus they are in fact minutely analyzing what they admit is merely
an abstract and basic model of the past tense problem.

>Although this move, [by R&M in their model], was inspired purely by
>considerations of computational economy.

For instance P&P go on so far as to dissect the R&M model's internal
representation of data, in order that they can prove some 'essential'
features of language are not included in the model. The point they are
missing is that, although the R&M internal representation of words,
'Wickelphones', is not entirely complete and correct in terms of
language theory, it is not supposed to be. For instance the R&M model
is only trained with 420 verb present-past pairs even though there are
thousands of different verbs in the language. This is not because R&M
could only think of 420 verb pairs, but because the model was not the
be all and end all of verb-past recognition, but merely an interesting
possible solution to the problem.

>Their model is offered precisely as a model of internal representation; The
>learning process is understood in terms of changes in a representational
>system as it converges on the mature state. [P&P dispute] that the
>Wickelphone/Wickelfeature provides an adequate basis for phonological
>generalization, circumventing the need to deal with strings.

It is in fact possible to find the sentence that R&M use to identify
that there program has been designed for limited use only.

>>[*See below] All we claim for the present coding scheme is its sufficiency
>>for the task of representing the past tenses of the 500 most frequent
>>verbs in English

However P&P seem so intent on destroying any possible new paths of
research from the R&M paper that they suggest that when R&M wrote the
previous sentence they didn't really mean it. P&P do not say what the
centrality of the 'Wickelfeature' is or why it makes the R&M model a
complete solution - even though R&M explicitly state that their model
is not a complete solution. (It seems that without this 'centrality'
P&P have nothing to write a paper about.)

>This disclaimer [see above*] is at odds with the centrality of the
>Wickelfeature in the model's design.

P&P quote a particular piece of the R&M paper, which they do not agree
with. This is shown below along with what they think of it.

>>"That a reasonable account of the acquisition of the past tense can be
>>provided without recourse to the notion of a 'rule' as anything more than
>>a description of the language."

>By this they mean that rules, as mere summaries of the data, are not
>intrinsically or causally involved in internal representations. Rumelhart
>and McClelland's argument for the broader claim is based entirely on the
>behavior of their model.

It is clear then that P&P like the idea that language can be completely
formalized as a set of rules. i.e. formal symbol manipulation. They do
not like the idea that even though the rules have not be hard wired in,
a neural network can be shown examples of input and output and then
learn the 'rules' to produce the correct output.

One of the more strange ways they go on to try and prove that a neural
net cannot work in the same way as the brain is to show that a neural
net can learn concepts which are not existent in any language. To me
this is a bit like saying that a computer cannot be any kind of
calculator because it might also play music.

>A quintessential unlinguistic map is relating a string to its mirror image
>reversal (this would relate pit to tip, brag to garb, dumb to mud, and so
>on); although neither physiology nor physics forbids it, no language uses
>such a pattern. But it is as easy to represent and learn in the RM pattern
>associator as the identity map.

P&P also thoroughly analyze the results of the R&M system in an attempt
to prove that they have so many and also so many unusual errors as to be
completely unlike human cognition.

>The bottom-line and most easily grasped claim of the RM model is that it
>succeeds at its assigned task: producing the correct past tense form.
>Rumelhart and McClelland are admirably open with their test data, so we can
>evaluate the model's achievement quite directly.

>Of the remaining 18 verbs for which the model did not output a single
>correct choice, 4 yielded grossly bizarre candidates: a. squat - squakt
> b. mail - membled
> c. tour - toureder
> d. mate - maded

>Taking these with the 6 no-shows, we have 20 out of the 72 test stems
>resulting in seriously wrong forms, a 28% failure rate. This is the state
>of the model after it has been trained 190-200 times on each item in a
>vocabulary of 336 regular verbs.

The errors shown above do prove that the R&M system is not particularly
excellent and certainly that it does not perform the verb-past problem
well. However given that the R&M system was only trained 200 times on a
vocabulary of 336 verbs, (as stated directly above), it would seem
unreasonable to expect it to be completely error free. For instance the
children that the system is compared to also make bizarre past tense
mistakes. An example from the actual P&P paper of child speech is shown

>I brekked your work.

Lisa has in this case made more than a simple mistake such as the common
child mistake 'run->ranned', (notice that the stem is correct - ran,
however the child has added a regular but unnecessary ending 'ned').
Lisa has also got the actual stem wrong - the correct verb-past pair
being 'break->broke'.

>What we have here is not a model of the mature system.

Thus, as shown above, P&P conclude that the R&M model is not a mature
system. Which I believe even R&M would agree with and which I believe
was in fact never in doubt. This is my main problem with the P&P paper.
As they admit themselves:

>Why subject the RM model to such painstaking analysis? Surely few models of
>any kind could withstand such scrutiny.

It is my opinion that although the P&P paper is able to pick apart many
facets of the R&M model with regard to language theory, the following
statements that P&P make is incorrect.

>Rumelhart and McClelland's surprising claims -- that language can be
>described only approximately by rules, that there is no induction problem
>in their account, and that we must revise our understanding of linguistic
>information processing -- are based on the putative success of their
>existing model. Given that their existing model does not do the job it is
>said to do, the claims must be rejected.

>The third claim, that the success of their model calls for a revised
>understanding of language and language acquisition, is hardly warranted in
>light of the problems we have discussed.

I believe these statements are incorrect because although the R&M
claims might have been a little too extravagant given their results
were not brilliant, the claims cannot be written off altogether. (A
major error I believe P&P have made is to believe that the R&M model
is a complete solution, which as discussed earlier, R&M explicitly
stated it was not.) It is this error which allows P&P to state that
all the R&M claims can be 'rejected'.

For instance even though P&P want to reject all the claims made by R&M
there are several areas in the P&P paper where they actually acknowledge
some of the claims as being correct?

>But connectionist models are more consistent with the sloppiness found in
>children's speech and adult's speech errors, which are more 'psychological'

>The model has raised intriguing questions about the role of the family
>resemblance structure of subregularities and of their frequency of
>exemplification in overregularization. But the model does not give superior
>or radically new answers for the questions it raises.

>It is not unthinkable that many of the design flaws could be overcome,
>resulting in a connectionist network that learns more insightfully.

>...interesting successes in simple domains such as learning to add
>two-digit numbers, detecting symmetry, or learning the exclusive-`or'
>operator. But there is always the danger in such systems of converging on
>incorrect solutions defined by local minima of the "energy landscape"
>defined over the space of possible weights.

>There is no reason to predict with certainty that these models will fail to
>acquire complex abilities such as mastery of the past tense system without
>wiring in traditional theories by hand -- but there is also no reason to
>predict that they will. At the same time, they show that there is no basis
>for the belief that connectionism will dissolve the difficult puzzles of
>language, or even provide radically new solutions to them.

>These problems are exactly that, problems. They do not demonstrate that
>interesting PDP models of language are impossible in principle. The crucial
>point is that adults can speak without error and can realize that their
>errors are. And children's learning culminates in adult knowledge. These
>are empirical facts that any theory must account for.

The reason I believe that P&P felt it necessary to write such a paper
was the R&M made several major mistakes themselves. Firstly they were
not sufficiently self-critical in either the way they trained their
model, and how this might impact on the results found, or on the
actual results found and whether they were actually similar to child
like cognition. From these results R&M made claims, some of which could
not be supported under detailed scrutiny of the model they used.
However this is not to say that all the claims were wrong as P&P
suggest. Some of the areas of the R&M paper that were weak are shown

>And in general, Rumelhart and McClelland do not present critical tests
>between competing hypotheses embodying minimally different assumptions.

>The shift from the first to the second stage of the model's behavior, then,
>is a direct consequence of a shift in the input mixture from a
>heterogeneous collection of patterns to a collection in which the regular
>pattern occurs in the majority.

>All five predictions work against the RM model and in favor of the
>explanation based on incorrect inputs. Kuczaj (1977) reports that his
>transcripts contained no examples where the child overapplied any
>subregularity, let alone a blend of two of them or of a subregularity plus
>the regular ending.

P&P also make several points that although correct do not mean that
work in connectionist areas will be fruitless, as they seem to believe.
In fact given these statements it would seem logical that connectionist
work should be furthered so that these statement can be proved or not.

>The child who has not yet figured out the distinction between regular,
>subregular, and idiosyncratic cases will display behavior that is similar
>to a system that is incapable of making the distinction -- the RM model.

>But subsymbolism or eliminative connectionism, as a radical metatheory of
>cognitive science, will not be vindicated if the principal structures of
>such hypothetical improved models turn out to be dictated by higher-level

>But a theory that can only account for errorful or immature performance,
>with no account of why the errors are errors or how children mature into
>adults, is of limited value.

>As for the present, we have shown that the paradigm example of a PDP model
>of language can claim nothing more than a superficial fidelity to some
>first-order regularities of language.

>These constraints are facts that any theory of language acquisition must be
>able to account for; a model that can learn all possible degrees of
>correlation among a set of features is not a model of the human being.

The main point that R&M are trying to make with their work is shown
below in there own words.

>>Subsymbolic models accurately describe the microstructure of cognition,
>>while symbolic models provide an approximate description of the
>>macrostructure. We view macrotheories as approximations to the underlying
>>microstructure but in some situations it will turn out that an examination
>>of the microstructure may bring much deeper insight.

P&P are correct in saying that this point, (above), has not been made
with the R&M work. However I think it is clear and I think that P&P
must agree that this area of research needs to be more fully explored.
The work by R&M may not be conclusive proof that connectionism is a
major part of cognition but it certainly raises questions and is
definitely not proof that connectionism has nothing to do with
cognition as P&P almost suggest.

Adam Hosier <>

This archive was generated by hypermail 2.1.4 : Tue Sep 24 2002 - 18:37:30 BST