Re: Kosslyn: Mental Imagery

From: Harnad, Stevan (
Date: Sun Jan 14 1996 - 18:55:24 GMT

> From: "Baden, Denise" <>
> Date: Fri, 24 Nov 1995 16:01:32 GMT
> It has been argued that imagery cannot be the sole form of internal
> representation, as an image cannot represent an object or scene without
> some interpretative function that picks out the salient features. The
> `stage directions' indicating which are the important features cannot
> also be images, or we will get into the problem of infinite regress.
> Kosslyn thinks, however that there is no reason why imagery cannot be
> one form of representation in memory.

The view that imagery leads to an infinite regress originates from
Pylyshyn (1973) and even earlier, and the intuition it is based on is
that of the "homuncular regress": What's going on in your head can't be
just images, because a little man in the head would then have to look at
those images, and if that just gives him another image in HIS head then
it just goes on and on. This critique is valid for the view that ALL
there is to cognition and perception is internal copies of images; but
it doesn't rule out that SOME of it can be internal images. Nor do those
images need to be looked at or interpreted by a homunculus. They may
simply be pieces of the processing that results in the successful
performance of whatever the task was. In mental rotation, for example,
it may make more sense for internal analogs to be rotated than to
transform images into bit maps or descriptions and do computations on
them, if the task is like Roger Shepard's mental rotation task, matching
2-D projections of 3-D shapes to rotated versions of them to judge
whether or not they are the same shape.

Pylyshyn, Z. W. (1973) What the mind's eye tells the mind's brain:
A critique of mental imagery. Psychological Bulletin 80: 1-24.

Shepard, R. N. & Cooper, L. A. (1982) Mental images and their
transformations. Cambridge: MIT Press/Bradford.

> Kosslyn describes several experiments which undertake to determine
> whether images act as functional representations which have real life
> spatial characteristics, or whether they are an epiphenomenon.
> Subjects were timed on how long it took them to scan a mental map.
> The results suggested that images do represent metric distance and
> that this property affects real-time processing of images. This
> implies that images also have spatial boundaries, and this was also
> tested, by seeing if subjects could image to the point of overflow.
> Subjects' reports indicated that there was a high correlation between
> the size of the imagined object and distance. It was also found that it
> took longer for subjects to see properties on subjectively smaller
> images. These results support the claim that our experienced images are
> spatial entities and their spatial properties have real consequences
> for some forms of information processing.

There is of course always the possibility that this "congruence" between
objects and images and performance is just a little game our brains are
playing on us to make us think we're doing the real work whenever we
have to do something in our heads (whereas the real work is done by some
unconscious process, perhaps even a nonimagistic, computational one);
after all, it's not clear what's the use of consciousness itself, much less
what's the use of conscious manipulation of images.

So I'm not sure whether the emphasis should be on CONSCIOUS images at
all. On the other hand, if it can be shown that, whether for people or
machines, a particular task is more easily or economically accomplished
by using inner analogs of 2-dimensional projections of outside objects,
or even 3-D analog reconstructions of them -- rather than by digitised
images or symbolic descriptions of the objects -- then it would seem
a good idea for either the brain or the machine to do it that way,
whether consciously or unconsciously.


> Kosslyn takes the results as supporting his CRT protomodel which
> predicts that images are processed by the same sorts of classificatory
> procedures that underlie normal perceptual processing. The CRT model
> rests on the notion that visual images might be like displays produced
> on a cathode ray tube by a computer programme operating on stored data.
> Images are thus seen as temporary spatial displays in active memory
> that are generated from more abstract representations in long term
> memory, and are then interpreted and classified.

This sounds more dubious, and goes beyond what Kosslyn has shown. To say
what is actually going on inside to generate the subjects' performance,
he would have to have a machine that could do exactly the same thing the
subjects could do. Whatever went on in that machine would then be a
model of what went on inside the subject.

> Kosslyn then goes on to examine the question of whether images are
> retrieved in toto from memory, or whether they are constructed from
> parts. It was found that larger and more detailed images took more time
> to construct, which favours the 2nd view. I find these experiments a
> bit dubious, as they seem to view image construction simply as a visual
> process, and ignore the possible underlying effects of internal talking.
> For instance if I'm asked to imagine a cow, and then it is measured how
> long it takes me to imagine its component bits, eg udder, eyes, tail
> etc. I might choose to say my image is complete once I have repeated
> the instructions to myself sotto voce, rather than when a mental
> picture has formed. Kosslyn et al do admit later that descriptive
> factors also play a role in imaging, but do not seem to take this into
> account when interpreting their earlier experiments.

Your criticism is right on target, and the same one Pylyshyn would
make. It is not possible to separate, in Kosslyn's experiments, which
of the effects are really caused by the processing of the input in the
service of the task, and which are just due to conscious strategies
that are superimposed on all this, and have as little to do with what
the real processing is as your introspections about how you retrieve
the name of your 3rd grade schoolteacher have to do with how your brain
actually does retrieve it.

> Kosslyn then goes on to construct a computer simulation model that
> reflects the properties of images that have been suggested from their
> experimental evidence. The simulation contains a `surface matrix'
> representing the image itself, and long term memory files which
> represent the information used in generating images. The surface matrix
> simulates 5 properties of imagery:
> 1. the image depicts information about spatial extent, also brightness,
> contrast etc.
> 2. the degree of activation decreases with distance from the centre, as
> `overflow experiment' suggests that images fade at the periphery.
> 3. the surface has limited resolution - based on finding that smaller
> images are more difficult to inspect.
> 4. the spatial image within which images occur is of limited extent,
> and round or elliptical in shape.
> 5. the matrix corresponds to visual short term memory, and is subject to
> fading - this arises from findings that complex images are more difficult to
> maintain.
> Image generation uses 3 procedures: PICTURE, PUT & FIND which perform
> the computations that generate the images. PICTURE takes specifications
> of size, location & orientation. PUT integrates parts into an image.
> FIND locates relevant parts of the image. Image classification: mainly
> uses FIND, but may need to call on other procedures such as LOOKFOR
> (SCAN,ZOOM,PAN,ROTATE) Image transformation: uses procedures above.
> Kosslyn et al feel that the use of a computer simulation model enables
> them to counter the objection that notions of mental imagery are vague
> and logically incoherent. They list the advantages of using a computer
> simulation model as follows: 1. it forces them to be explicit in their
> assumptions 2. features of computational models correspond closely to
> many features of cognitive models. 3. it shows whether the ideas are
> sufficient in principle to account for the data. 4. enables predictions
> to be made on the basis of complex interactions among components.
> Whilst I do not argue against any of these advantages, I personally
> feel that constructing models based on data and ideas arising from
> computer simulations would tend to lead you into mistaken `imagery'
> about how the brain works. This is because of the fundamental
> differences between the two processing techniques i.e. a computer
> processes in a serial fashion while human brains operate by parallel
> distributed processing.

Yes and no. If you make a computer model that can do what the human can
do, with the same input, then at the very least you have ONE way you
can be SURE it can indeed be done. Whether it is the way WE do it
depends on a lot of other things, of which parallel processing might or
might not be one important example. I'm inclined to think that it is
less important than, say, what ELSE the model can do, for there are
countless ways to do just one simple, circumscribed little task, most
of which would not have any relevance to doing the many other things we
can do. The "right" way surely has to be able to scale up to the rest
of our capacity.

Besides that, one can simulate parallelism just as one can simulate
images. It is true that the digital computer is a serial machine, doing
one step after the other, one at a time, very fast. But that is not the
relevant part of a computer simulation. The relevant part is how it
shows that a simulation of the resources you claim the brain uses (e.g.,
parallel arrays) can actually deliver the goods. For this, the digital
computer can be the right testing tool without having to BE like the
brain, or like any of the other things it is simulating, at the hardware
level. It is in software that the simulation resides, and serial
computers CAN simulate parallel brains.

What matters with simulations, though, is that they must not be
circular: you can easily build into a simulation both the problem and
its solution, so in a sense the simulation does not have to solve
anything at all. In that case the simulation shows next to nothing. In
cognitive simulations, the essential feature is that the computer must
be able to do what the person can do, given the same (or equivalent)
input. To the extent that Kosslyn's model can do that, it is a
demonstration that it CAN be done, with the resources he has provided.
Whether it's the way our brain does it, whether it is an economical or
efficient way, whether it will scale up to the rest of what we can do
-- all that is another matter.

> Kosslyn goes on to counter some of the objections about the validity of
> his results. These are well founded questions because the whole basis
> of his theory is very shaky unless he can convince us that the subject's
> reports accurately mirror what is going on in their brains.
> One objection is the subjects try to give `right' answers by
> fulfilling the experimenters expectations in their reports. Kosslyn
> counters this rather weakly by pointing out instances of results that
> ran counter to their own expectations. However the subjects themselves
> may report based on their own expectations. For example, they know it
> takes longer to scan a wide image in real life and so assume it would
> be `correct' to take longer in image scanning. There are many
> objections centred around this general theme, and Kosslyn reports that
> they try to prevent these effects by questioning the subjects about
> what they thought the purpose of the experiments were, and whether they
> used particular strategies. It still seems to be an open question of
> whether, or by how much, the image scanning effects are contaminated by
> the demand characteristics of the experiment.

You're right, and this is why it would be much more informative and
convincing if Kosslyn dropped the considerations that have to do with
things we introspect and concentrated merely on our performance
capacity: what we can do. His model can then be claimed to be an
instance of how it can be done with "images," without leading to any
infinite regress. How realistic the model was would then have to be
assessed the hard way: by seeing what else we can do that it can scale
up to. It could also be compared to other methods for accomplishing the
same thing, to see whether it has any resource advantages, but again,
this is only interesting in the context of scaling up, not at the level
of comparing tiny arbitrary fragments of our capacity, so-called "toy"

> Kosslyn also counters Anderson's argument that many models could
> account for the same data, eg one would not be able to distinguish a
> propositional or an image based theory from the data.
> Kosslyn suggests that a large mental image should take longer to rotate
> as it will cover more `cells' than a small image, a prediction that
> would not arise from a propositional theory.

I think Anderson is just plain wrong here, because one COULD make two
models, one that does it by manipulating analogs of the sensory input
projection or even reconstructions of the objects from which they
originate, and another that does it by manipulating symbolic
descriptions of the input and/or the object. Again, the question of
which, if either, was more like what really goes on in our heads cannot
be settled at the level of the toy task, but by seeing whether and how
each of the two forms of processing scales up to the rest of what we can
do. (And the fact that both the analog model and the computational model
are simulated on a digital computer is an interesting and somewhat
ironic fact, but it too does not rule out drawing conclusions: We can
(correctly) simulate the analog motions of planets on a digital computer
too, and that does not mean it is undecidable whether the real planets
really move or merely manipulate symbolic descriptions: They really

(The brain, by the way, would come into this contest rather late in the
day, by which I mean that making a model more "brainlike" is at this
early toy stage much less relevant or significant than making it scale
up [unless of course the brain actually gives you a clue about how to
scale up -- something that to my knowledge has never happened so far].
If at the end of the day we have two different performance models that
can both do EVERYTHING that we can do, then we can look into them to
see which is, or can be made to be, more brainlike. But at the the mere
toy stage, brainlikeness is not only trivial, but it is likely to
distract us from scaling up; moreover, what we actually know about the
brain -- hence about what is "brainlike" -- is itself currently at such
a toy stage that it is hardly a way of validating a performance model:
indeed, one could easily go the other way around, and say that to be a
realistic model of what is "brainlike," a model must have our
performance competence. After all, what it can do is also a property of
the brain, and surely its most important one!)

> Kosslyn also takes on
> Pylyshyn's challenge to determine which aspects of the imagery system
> are `computationally primitive' i.e. cannot be affected by
> expectations, intentions etc. They believe that the visual buffer is
> primitive - could not allow cognitive penetration. They also believe,
> due to parsimony considerations, that processes that detect patterns in
> images (which they call the minds eye) will be the same as those that
> detect patterns in the visual system. They do allow that image
> transformation and image classification are not likely to be
> primitive.

This "cognitive impenetrability" criterion is probably a rather
arbitrary one. Pylyshyn's idea was that only symbol manipulation
(computation) is cognitive; anything else, anything before that or below
that, is just the activity of the sensory hardware or the architecture
of the computer itself (which, as we know, is irrelevant to cognition
according to computationalism, because it is the programme, the
software, that matters, not what hardware it happens to be running on).
So Pylyshyn proposed that whatever could be altered or modified or
influenced by what you KNOW is cognitive (because it's cognitively
penetrable and hence computational (i.e., occurs at the software
level), whereas what is NOT cognitively penetrable occurs below the
level of cognition.

A standard example of cognitive impenetrability is supposed to be the
Mueller-Lyer illusion of the two lines, where one looks longer than the
other because it has an arrow pointing inward at the end, the shorter
one having an arrow pointing outward: Knowing that it's an illusion,
that they are really the same length, and why, does not change
anything: One still looks longer than the other. Therefore, being
cognitively impenetrable it must be a built-in bit of our hardware,
whereas other things are not.

The trouble is that there are plenty of things that look at first to be
"cognitively impenetrable," but with some concerted training, they
prove to be penetrable after all. Koehler's famous experiments with
displacing lenses that make the world look upside down or displaced
sideways show that with time people adapt, and start to see things as
normal again, so much so that if you remove the lenses after a few
days, the right-side up world looks upside down for a while! Perhaps
such systematic adaptation training (it usually requires sensorimotor
interaction with objects, not just passive viewing) could even
penetrate the Mueller-Lyer illusion.

Moral: Don't set too much store by cognitive impenetrability. And
besides, cognitive penetrability does not necessarily imply that
something is then computational either! It is unlikely that the lens
adaptation was a change in the symbolic descriptions of objects, rather
than in their analog sensorimotor projections and interrelations...

> Kosslyn et al are aware of the difficulties in their approach, and do
> not claim to be attempting any more than a `protomodel'. They also
> admit that the model is continuously being revised to fit the data, and
> defend this by claiming that this is the essence of the model
> constructing process. I am personally not too happy about this stance,
> because, as I have already pointed out, the model is based on a
> fundamentally different processing style to that of the human brain. I
> also remain skeptical that, whether the millisecond time differences
> between subject response times to various tasks, can be seen as
> reflecting genuine differences in procedures, rather than individual
> differences, expectancies etc.

Constantly gerrymandering a model to make it conform to performance data
is, I agree, not very interesting. Scaling it up to more and more
performance CAPACITY is more sensible. But don't worry about the brain
or the hardware for the time being; there's still to much work to be
done in getting it to DO what we can do ANY which way...

Chrs, Stevan

Pylyshyn, Zenon W. Computation and cognition: Issues in the foundations
of cognitive science. Behavioral & Brain Sciences, 1980 Mar, v3

ABSTRACT: The computational view of mind rests on certain intuitions
regarding the fundamental similarity between computation and cognition.
Some of these intuitions are examined, and it is suggested that they
derive from the fact that computers and human organisms are both
physical systems whose behavior is described as being governed by rules
acting on symbolic representations. The paper elaborates various
conditions that need to be met if a literal view of mental activity as
computation is to serve as the basis for explanatory theories. The
coherence of such a view depends on a principled distinction between
functions whose explanation requires internal representations and those
that can appropriately be described as merely instantiating causal
physical or biological laws. Functions are said to be cognitively
impenetrable if they cannot be influenced by such purely cognitive
factors as goals, beliefs, and inferences. Several commentaries are

Pylyshyn, Zenon W. Computational models and empirical constraints.
Behavioral & Brain Sciences, 1978 Mar, v1 (n1):93-127.

ABSTRACT: Contends that the traditional distinction between artificial
intelligence and cognitive simulation amounts to little more than a
difference in style of research. Both enterprises are constrained by
empirical considerations and both are directed at understanding classes
of tasks that are defined by essentially psychological criteria. The
different ordering of priorities, however, causes them to occasionally
take different stands on such issues as the power/generality trade-off
and the relevance of the data collected in experimental psychology
laboratories. Computational systems are ways of empirically exploring
the adequacy of methods and of discovering task demands. For
psychologists, computational systems should be viewed as functional
models independent of neurophysiological systems. As model objects,
however, they present a serious problem of interpretation and
communication, since the task of extracting relevant theoretical
principles from a complex program may be formidable. Methodologies
(intermediate state, relative complexity, and extendability) for
validating computer programs as cognitive models are briefly described.
30 commentaries and the author's response are also presented.

Kosslyn, Stephen M.; Pinker, Steven; Smith, George E.; Shwartz, Steven
P. On the demystification of mental imagery. Behavioral & Brain
Sciences, 1979 Dec, v2 (n4):535-581.

ABSTRACT: Discusses the formulation of a theory of mental imagery. The
1s t section outlines the general research direction taken and provides
an overview of the empirical foundations of a theory of image
representation and processing. Four issues are considered, and results
of experiments are presented. The 2nd section discusses the proper form
for a cognitive theory, and the distinction between a theory and a
model is developed. The present theory and computer simulation model
are then introduced. This theory specifies the nature of the internal
representations (data structures) and the processes that operate on
them when one generates, inspects, or transforms mental images. In the
3rd section, 3 kinds of objections to the present research program are
considered, one hinging on the possibility of experimental artifacts in
the data, and the others turning on metatheoretical commitments about
the form of a cognitive theory. 26 peer comments on the theory are
included, along with a final response by the authors.

This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:23:57 GMT