Harnad: The Symbol Grounding Problem

From: Henderson Ian (irh196@ecs.soton.ac.uk)
Date: Tue May 01 2001 - 20:59:51 BST

In this paper, Stevan Harnad outlines a bottom-up hybrid
connectionist-symbolic system intended to solve the symbol grounding
problem, first explicated by Searle in his famous 'Chinese Room Argument'.

> This concept of an autonomous symbolic level also conforms to general
> foundational principles in the theory of computation and applies to all
> the work being done in symbolic AI, the branch of science that has so far
> been the most successful in generating (hence explaining) intelligent
> behavior.

Harnad means that in generating intelligent behaviour, the scientist must
necessarily know how the system he designed works: in other words he is
capable of explaining its behaviour because he designed and built it.
Whether that explanation may help us understand human intelligence is
another matter. Many of the toy problems with which symbol systems have had
success give little insight into human thought processes: for instance the
game of chess, where what is in essence a brute force depth-first search
strategy has succeeded in producing computer players capable of rivaling the
top grandmasters. This is patently not, however, how grandmasters or humans
in general play chess. Symbolic AI has had much less success in reverse
engineering intelligence.

> It is not enough, for example, for a phenomenon to be interpretable as
> rule-governed, for just about anything can be interpreted as
> rule-governed. A thermostat may be interpreted as following the rule:
> Turn on the furnace if the temperature goes below 70 degrees and turn it
> off if it goes above 70 degrees, yet nowhere in the thermostat is that
> rule explicitly represented. Wittgenstein (1953) emphasized the
> difference between explicit and implicit rules: It is not the same thing
> to "follow" a rule (explicitly) and merely to behave "in accordance with"
> a rule (implicitly).[2]

Rules are represented explicitly in computer programs: the archetypal
IF...THEN...ELSE construction being the most obvious example. This applies
just as much to programs *simulating* neural networks as it does to other
programs: updating rules for neuron weights are specified *explicitly* in a
software simulation of a neural network, although (like Harnad's thermostat)
they may not be made explicit in a physical implementation. Similarly, a
program simulating the working of the thermostat would of course be
rule-governed, but then it wouldn't be a thermostat: the thermostat is
implementation-dependent. A neural network simulated in software isn't
actually a neural network just as much as a photo of a vase isn't a vase. If
all neural network functionality can be modeled algorithmically, however,
symbol systems will be able to offer an empirical explanation of how neural
networks function, and thus connectionism will no longer be able to present
a rival explanation of human intelligence to symbolic AI.

> So the mere fact that a behavior is "interpretable" as ruleful does not
> mean that it is really governed by a symbolic rule.[3]

As Chomsky points out, the human brain may not function according to
explicit rules either. Rules may appear to model human thought processes,
but whether they dictate them is another thing. This does not undermine the
usefulness of rule-based systems as an approach to the forward engineering
of intelligence: even if we were to discover how the brain functions (which
in itself is a long way off), and that it didn't work according to a set of
rules, this fact would not preclude the possibility of creating intelligence
using symbol systems.

> Connectionism will accordingly only be considered here as a cognitive
> theory. As such, it has lately challenged the symbolic approach to
> modeling the mind. According to connectionism, cognition is not symbol
> manipulation but dynamic patterns of activity in a multilayered
> network of nodes or units with weighted positive and negative
> interconnections.

The physical basis for connectionist theory lies (tenuously) in biological
neurons (nerve cells). As a result, neural networks present a more
appropriate paradigm for modeling brain function than symbol systems. The
origins of symbolic computing on the other hand lie in attempts to forward
engineer intelligence, from Babbage's analytical engine onwards: there is no
biological basis for symbolic AI.

> It is far from clear what the actual capabilities and limitations of
> either symbolic AI or connectionism are. The former seems better at
> formal and language-like tasks, the latter at sensory, motor and learning
> tasks, but there is considerable overlap and neither has gone much
> beyond the stage of "toy" tasks toward lifesize behavioral capacity.
> Moreover, there has been some disagreement as to whether or not
> connectionism itself is symbolic.

Again, although some neural network functionality may be implemented
symbolically, this doesn't make connectionism symbolic. In the same way,
just because some human activities such as chess playing and language
translation may be implemented symbolically does not mean that the human
thought process in symbolic. Not all connectionist functionality has been
found to have a symbolic analog as yet.

> Yet it is not clear whether connectionism should for this reason aspire
> to be symbolic, for the symbolic approach turns out to suffer from a
> severe handicap, one that may be responsible for the limited extent of
> its success to date (especially in modeling human-scale capacities) as
> well as the uninteresting and ad hoc nature of the symbolic "knowledge"
> it attributes to the "mind" of the symbol system. The handicap has
> been noticed in various forms since the advent of computing; I have
> dubbed a recent manifestation of it the "symbol grounding problem"
> (Harnad 1987b).

One of the most well known instances of the symbol grounding problem
encountered in trying to model more 'human-scale' capacities is the frame
problem. This problem occurs when the symbolic system is asked a question
outside the 'knowledge' with which it is programmed, but pertaining in some
way to that knowledge. The failure to give a reasonable answer to such
questions demonstrates that the system has not really 'understood' the
knowledge with which it has been programmed, and this is considered
symptomatic of the lack of grounding of that knowledge in real-world
experience. Such experience is achieved through the means of learning:
learning is a central feature of connectionist systems, and thus these
systems lend themselves well to tackling the symbol grounding problem.

> Searle's simple demonstration that this cannot be so consists of
> imagining himself doing everything the computer does -- receiving the
> Chinese input symbols, manipulating them purely on the basis of their
> shape (in accordance with (1) to (8) above), and finally returning the
> Chinese output symbols. It is evident that Searle (who knows no Chinese)
> would not be understanding Chinese under those conditions -- hence
> neither could the computer.

Searle's argument works because symbol systems are implementation
independent. Thus in theory Searle can take the place of the computer and
carry out the computation 'by hand'. Obviously he could do this without
understanding Chinese (in the way humans talk about understanding

> Let us first look more closely at discrimination and identification. To
> be able to discriminate is to able to judge whether two inputs are the
> same or different, and, if different, how different they are.
> Discrimination is a relative judgment, based on our capacity to tell
> things apart and discern their degree of similarity. To be able to
> identify is to be able to assign a unique (usually arbitrary) response --
> a "name" -- to a class of inputs, treating them all as equivalent or
> invariant in some respect. Identification is an absolute judgment, based
> on our capacity to tell whether or not a given input is a member of a
> particular category.

This requires a capacity for abstraction: the identification through a
learning process of a set of relevant features that define a class of
object. For instance, 'spoons' are a class of object, and identification is
to be able to recognise a spoon as being an instance of that spoon class.

> According to the model being proposed here, our ability to discriminate
> inputs depends on our forming "iconic representations" of them
> (Harnad 1987b). These are internal analog transforms of the projections
> of distal objects on our sensory surfaces (Shepard & Cooper 1982).
> In the case of horses (and vision), they would be analogs of the many
> shapes that horses cast on our retinas.[14]

For instance, the visual iconic representation of a horse is simply an image
of that horse in memory. An audio iconic representation of a horse is the
sound of the horse in memory: that is all that is meant by an 'analog
transform'; just a projection of the horse by our senses into our short term
sensory memory. Despite the overtones imparted by Harnad's use of the word
'iconic', these projections are not caricatures in any way: there is no
emphasis of the 'key' features needed to identify the input as belonging to
a specific class of object. Thus iconic representations are (only)
sufficient to discriminate between two objects: to be able to recognise that
a difference exists at a feature level (but not at the more abstract class
level) between the objects, for instance that they differ in colour or
position, but not that they differ in class; that requires identification.
Harnad shouldperhaps have made this distinction more apparent: by
'discrimination' he means discrimination at the instance level, not the
class level.

> Icons of sensory projections are too unselective. For identification,
> icons must be selectively reduced to those "invariant features" of the
> sensory projection that will reliably distinguish a member of a category
> from any nonmembers with which it could be confused. Let us call the
> output of this category-specific feature detector the "categorical
> representation" .

Persistent 'categorical' representations are comprised of the pivotal set of
features that describe a specific class of object; they are like caricatures
in that they detail the features necessary for identifying instances of the
class they represent. For instance, the categorical (class) representation
of a spoon might be an image consisting of a handle and a rounded end. A
categorical representation can be compared with an input iconic
representation to determine whether the iconic representation conforms to
the class description implicit in categorical representation. I guess this
process of comparison would not be dissimilar to that employed during
discrimination: instead of discriminating between two input iconic
representations, identification involves discriminating between an iconic
representation and a categorical representation. The main difference between
the two processes isthe result: discrimination just measures the degree of
sameness; indentification measures the degree of sameness to determine
whether or not the iconic representation is an instance of the categorical

> Nor can categorical representations yet be interpreted as "meaning"
> anything. It is true that they pick out the class of objects they "name,"
> but the names do not have all the systematic properties of symbols and
> symbol systems described earlier. They are just an inert taxonomy.
> For systematicity it must be possible to combine and recombine them
> rulefully into propositions that can be semantically interpreted.
> "Horse" is so far just an arbitrary response that is reliably made in the
> presence of a certain category of objects. There is no justification for
> interpreting it holophrastically as meaning "This is a [member of the
> category] horse" when produced in the presence of a horse, because the
> other expected systematic properties of "this" and "a" and the all
> important "is" of predication are not exhibited by mere passive
> taxonomizing. What would be required to generate these other systematic
> properties? Merely that the grounded names in the category taxonomy be
> strung together into propositions about further category membership
> relations.

Harnad underestimates this problem in my view. How do humans understand the
connectives and quantifiers that provide the glue to such propositions? Is
our ability in this innate, as it is in Harnad's proposed hybrid system
where the logical connectives seem to be intrinsic to the symbol system used
to express seemingly meaningful statements about objects (grounded directly
or indirectly by the connectionist part of the system) that the symbol
system encounters? Is it not possible that concepts such as 'and', 'all',
and 'not' must also be grounded in order to be understood as humans
understand them? It seems to me that such connectives are learnt by humans,
and are not hard-wired as in Harnad's hybrid system: children seem to learn
such connectives in a similarly supervised manner to the way in which they
learn to identify object classes: by being told 'this is a horse', 'this is
not a horse' etc.. In consequence, if this is the case, it may be that
Harnad's system can do no more than discriminate, identify and describe (in
the sense of decomposing derived symbols into their constituent elementary
ones): it cannot understand 'zebra = horse & stripes' because the connective
symbols '=' and '&' are not grounded either directly or indirectly. It may
be able to identify a zebra, but can it really understand what a zebra is as
we do?

> Once one has the grounded set of elementary symbols provided by a
> taxonomy of names (and the iconic and categorical representations that
> give content to the names and allow them to pick out the objects they
> identify), the rest of the symbol strings of a natural language can be
> generated by symbol composition alone,[18] and they will all inherit the
> intrinsic grounding of the elementary set.[19]

It should be noted here that different individuals may possess different
elementary sets. For instance, a Kenyan might see a horse as a zebra
*without* stripes, in which case zebra and stripes are the grounded
elementary symbols. This is, however, of little importance: at the end of
the day, *an* elementary set exists, even if it differs in its membership
and extent from individual to individual. Similar differences may exist at a
societal level too. For instance a boundary effect can be found in colour
perception, where the eye learns to separate out wavelengths of the
continuous electro-magnetic spectrum into discrete categories of light, for
instance those given the names 'red', 'blue', and 'green'. These boundaries
may vary, just as the Kenyan sees a zebra and the Westerner sees a horse
with stripes: indigenous tribes living in naturally lit environments may
develop different colourboundaries to people living in technologically
advanced, urban societies. Similarly, the level of detail at which we draw
the line between elementary symbols and derived ones doesn't matter: for
instance, a horse may be derived from the elementary symbols 'leg', 'body',
'tail', 'hoof' etc. instead of being an elementary symbol itself. The fact
is that a line is drawn; how it is drawn is a problem for the sociologist
and anthropologist, not the cognitive scientist. At the end of the day, as
Harnad points out in footnote [19], the relevant point to his argument is
that such elementary sets exist.

> Connectionism, with its general pattern learning capability, seems to be
> one natural candidate (though there may well be others): Icons, paired
> with feedback indicating their names, could be processed by a
> connectionist network that learns to identify icons correctly from the
> sample of confusable alternatives it has encountered by dynamically
> adjusting the weights of the features and feature combinations that are
> reliably associated with the names in a way that (provisionally) resolves
> the confusion, thereby reducing the icons to the invariant (confusion
> resolving) features of the category to which they are assigned.

Harnad suggests that a neural network could be used to form categorical
representations of objects linked with their (arbitrary) names. The learning
would be supervised, and the success of the network in correctly identifying
previously unseen input would be provisional upon the set of training
examples to which it had been thus far exposed. In this way, neural networks
mimic the way children appear to learn, and from a reverse engineering
perspective present a good candidate method for achieving symbol grounding.

> The expectation has often been voiced that "top-down" (symbolic)
> approaches to modeling cognition will somehow meet "bottom-up" (sensory)
> approaches somewhere in between. If the grounding considerations in this
> paper are valid, then this expectation is hopelessly modular and there is
> really only one viable route from sense to symbols: from the ground up.

Harnad's bottom-up approach to developing a hybrid connectionist-symbolic
system seems to be a common-sense way of modeling the mind: start with what
we know and understand (the senses) and build up the system from there.
However, as previously discussed, he seems to gloss over the problem of the
logical connectives, which is especially important if we are trying to
reverse engineer intelligence, as connectives appear to be acquired by
humans through learning rather than genetics. In doing this, Harnad appears
to make the same mistake as the symbolists when they claim that "all" their
symbol systems require is to be connected to the world 'in the right way':
they underplay the importance of the interface between the symbol system and
the real world; Harnad similarly neglects the interface between the
connectionist and symbolic parts of his hybrid system.

This archive was generated by hypermail 2.1.4 : Tue Sep 24 2002 - 18:37:30 BST