Paper presented at UQˆM Summer Institute in Cognitive Sciences on Categorisation 2003
Cognition is categorization
Organisms are sensorimotor systems. The things in the world come in contact with our sensory surfaces, and we interact with them based on what that sensorimotor contact "affords."
To say this is not to declare oneself a Gibsonian, whatever that means. It is merely to point out that what a sensorimotor system can do is determined by what can be extracted from its motor interactions with its sensory input. If you lack sonar sensors, then your sensorimotor system cannot do what a bat's can do, at least not without the help of instruments. Light stimulation affords color vision for those of us with the right sensory apparatus, but not for those of us who are color-blind. The geometric fact that, when we move, the "shadows" cast on our retina by nearby objects move faster than the shadows of further objects means that, for those of us with normal vision, our visual input affords depth perception. From more complicated facts of projective and solid geometry it follows that a 3-dimensional shape, such as, say, a boomerang, can be recognized as being the same shape Ð and the same size Ð even though the size and shape of its shadow on our retinas changes as we move in relation to it or it moves in relation to us. Its shape is said to be invariant under these sensorimotor transformations, and our visual systems can detect and extract that invariance, and translate it into a visual constancy. So we keep seeing a boomerang of the same shape and size even though the shape and size of its retinal shadows keep changing.
So far, the affordances I've mentioned have depended on having either the right sensors, as in the case of sonar and color, or the right invariance-detectors, as in the case of depth perception and shape/size constancy. Having the ability to detect the stimulation or to detect the invariants in the stimulation is not trivial; this is confirmed by the fact that sensorimotor robotics and sensorimotor physiology have so far managed to duplicate and explain only a small portion of this subset of our sensorimotor capacity. But we are already squarely in the territory of categorization here, for, to put it most simply and generally, categorization is any systematic differential interaction between an autonomous, adaptive sensorimotor system and its world: Systematic, because we don't want arbitrary interactions like the effects of the wind blowing on the sand in the desert to be counted as categorization (though perhaps there are still some inherent similarities there worth noting). Neither the wind nor the sand is an autonomous sensorimotor system; they are, jointly, simply dynamical systems, systems that interact and change according to the laws of physics.
Everything in nature is a dynamical system, of course, but some things are not only dynamical systems, and categorization refers to a special kind of dynamical system. Sand also interacts "differentially" with wind: Blow it this way and it goes this way; blow it that way and it goes that way. But that is neither the right kind of systematicity nor the right kind of differentiality. It also isn't the right kind of adaptivity (though again, categorization theory probably has a lot to learn from ordinary dynamical interactions too, even though they do not count as categorization).
Dynamical systems are systems that change in time. So it is already clear that categorization too will have to have something to do with changes across time. But adaptive changes in autonomous systems are those in which internal states within the autonomous system systematically change with time, so that, to put it simply, the exact same input will not produce the exact same output across time, every time, the way it does in the interaction between wind and sand (whenever the wind blows in exactly the same direction and the sand is in exactly the same configuration). Categorization is accordingly not about exactly the same output occurring whenever there is exactly the same input. Categories are kinds, and categorization occurs when the same output occurs with the same kind of input, rather than the exact same input. And a different output occurs with a different kind of input. So that's where the "differential" comes from.
The adaptiveness comes in with the real-time history. Autonomous, adaptive sensorimotor systems categorize when they respond differentially to different kinds of input, but the way to show that they are indeed adaptive systems -- rather than just akin to very peculiar and complex configurations of sand that merely respond (and have always responded) differentially to different kinds of input in the way ordinary sand responds (and has always responded) to wind from different directions -- is to show that at one time it was not so: that it did not always respond differentially as it does now. In other words (although it is easy to see it as exactly the opposite): categorization is intimately tied to learning.
Why might we have seen it as the opposite? Because if instead of being designers and explainers of sensorimotor systems and their capacities we had simply been concerned with what kinds of things there are in the world, we might have mistaken the categorization problem as merely being the problem of identifying what exists (that sensorimotor systems can then go on to categorize). But that is the ontic side of categories, concerned with what does and does not exist, and that's probably best left to the respective specialists in the various kinds of things there are (specialists in animals, vegetables, or minerals, to put it simply). The kinds of things there in the world are, if you like, the sum total of the world's potential affordances to sensorimotor systems like ourselves. But the categorization problem is not determining what kinds of things there are, but how it is that sensorimotor systems like ourselves manage to detect those they can and do detect: how they manage to respond differentially to them.
Now it might have turned out that we were all born with the capacity to respond differentially to all the kinds of things that we do respond to differentially, without ever having to learn to do so (and there are some, like Jerry Fodor (1975, 1981, 1998), who sometimes write as if they believe this is actually the case). Learning might all be trivial; all the invariances we can detect, we could already detect innately, without the need of any internal changes that depend on time or any more complicated differential interaction of the sort we call learning. This kind of extreme nativism about categories is usually not far away from something even more extreme than nativism, which is the view that our categories were not even "learned" through evolutionary adaptation: The capacity to categorize comes somehow prestructured in our brains in the same way that the structure of the carbon atom came prestructured from the Big Bang, without needing anything like "learning" to shape it. (Fodor's might well be dubbed a "Big Bang" theory of the origin of our categorization capacity.)
(Chomsky [e.g., 1976] has made a similar conjecture Ð about a very special subset of our categorization capacity, namely, the capacity to generate and detect all and only those strings of words that are grammatical according to the Universal Grammar underlying all possible natural languages: UG-compliance is the underlying invariant in question, and, according to Chomsky, our capacity to detect and generate UG-compliant strings of words is shaped neither by learning nor by evolution; it is instead somehow inherent in the structure of our brains as a matter of structural inevitability, directly from the Big Bang. This specific theory, about UG in particular, is not to be confused with Fodor's general theory that all categories are unlearnt and unevolved; in the case of UG there is considerable "poverty-of-the-stimulus" evidence to suggest that UG is not learnable by children on the basis of the data they hear and produce within the time they take to learn their first language; in the case of most of the rest of our categories, however, there is no such evidence.)
All evidence suggests that most of our categories are learned. To get a sense of this, open a dictionary at random and pick out a half dozen "content" words (skipping function words such as "if," "not" or "the"). What you will find is nouns, verbs, adjectives and adverbs all designating categories (kinds of objects, events, states, features, actions). The question to ask yourself is: Was I born knowing what are and are not in these categories, or did I have to learn it?
You can also ask the same question about proper names, even though they don't appear in dictionaries: Proper names name individuals rather than kinds, but for a sensorimotor system, an individual is effectively just as much of a kind as the thing a content word designates: Whether it is Jerry Fodor or a boomerang, my visual system still has to be able to sort out which of its shadows are shadows of Jerry Fodor and which are shadows of a boomerang. How?
Nor is it all as easy as that case. Consider the more famous and challenging pronlem of sorting newborn chicks as males or females. I'm not sure whether Fodor thinks this capacity could be innate, but the grandmaster, 8th-degree black-belt chicken-sexers on this planet Ð of which there are few, most of them in Japan Ð say that it takes years and years of trial and error training under the supervision of masters to reach black-belt level; there are no short-cuts, and most aspirants never get past brown-belt level. (We will return to this.) Categorization, it seems, is a sensorimotor skill, though most of the weight is on the sensory part (and the output is usually categorical, i.e., discrete, rather than continuous); and like all skills, it must be learned.
So what is learning? It is easier to say what a system does when it learns than to say how it does it: Learning occurs when a system samples inputs and generates outputs in response to them on the basis of trial and error, its performance guided by corrective feedback. Things happen, we do something in response; if what we did was the right thing, there is one sort of consequence; if it was the wrong thing there is another sort of consequence. If our performance shows no improvement with time, then we are like the sand in the wind. If our performance improves Ð more correct outputs, fewer errors Ð then we are learning. (Note that this presupposes that there is such a thing as an error, or miscategorization: No such thing comes up in the case of the wind, blowing the sand.)
This sketch of learning should remind us of BF Skinner, behaviorism, and schedules of reward and punishment. For it was Skinner who pointed out that we learn on the basis of feedback from the consequences of our behavior. But what Skinner did not provide was the internal mechanism for this sensorimotor capacity we and so many of our fellow-creatures have, just as Gibson did not provide the mechanism for picking up affordances. Both these thinkers thought that providing internal mechanisms was either not necessary or not the responsibility of their discipline. They were concerned only with describing the input and the sensorimotor interactions, not how a sensorimotor system could actually do those things. So whereas they were already beginning to scratch the surface of the "what" of our categorization capacity, in input/output terms, neither was interested in the "how."
Let us, too, set aside the "how" question for the moment, and note that so-called operant or instrumental learning -- in which, for example, a pigeon is trained to peck at one key whenever it sees a black circle and another key whenever it sees a white circle (with food as the feedback for doing the right thing and no-food as the feedback for doing the wrong thing) -- is already a primitive case of categorization. It is a systematic differential response to different kinds of input, performed by an autonomous adaptive system that responded randomly at first, but learned to adapt its responses under the guidance of error-correcting feedback (thanks, presumably, to some sort of adaptive change in its internal state). The case of black vs. white is relatively trivial, because the animal's sensory apparatus already has those two kinds of inputs well-segregated in advance, although if, after training on just black and white, we began to "morph" them gradually into one another as shades of gray, and tested those intermediate shades without feedback, the pigeon would show a smooth "generalization gradient," pecking more on the "black" key the closer the input was to black, more on the white key the closer the input was to white, and approaching a level of chance performance midway between the two. The same would be true for a human being in this situation.
But if the animal had color vision, and we used blue and green as our inputs, the pattern would be different. There would still be maximal confusion at the blue-green midpoint, but on either side of that boundary the correct choice of key and the amount of pressing would increase much more abruptly Ð one might even say "categorically" -- than with shades of gray. The reason is that between black and white there is no innate category boundary, whereas between green and blue there is (in animals with normal green/blue color vision). The situation is rather similar to hot and cold, where there is a neutral point midway between the two poles, feeling neither cold nor hot, and then a relatively abrupt qualitative difference between the "warm" range and the "cool" range in either direction.
This effect is called "categorical perception" (CP) and in the case of color perception, the CP is innate. Light waves vary in frequency. We are blind to frequencies above red (infrared, wavelength about 800 nm) or below violet (ultraviolet, wavelength about 400 nm), but if we did not have color CP then the continuum from red to violet would look very much like shades of gray, with none of those qualitative "bands" separated by neutral mixtures in between that we all see in the rainbow or the spectrum. Our color categories are detected by a complicated sensory receptor mechanism, not yet fully understood, whose components include not just light frequency, but other properties of light, such as brightness and saturation, and an internal mechanism of three specialized detectors selectively tuned to certain regions of the frequency spectrum (red, green, and blue), with an "opponent-process"relation between their activities (red being opposed to green and blue being opposed to yellow). The outcome of this innate invariance extracting mechanism is that some frequency ranges are automatically "compressed": we see them all as just varying shades of the same qualitative color. These compressed ranges are then separated from adjacent qualitative regions, also compressed, by small, boundary regions that look like indefinite mixtures, neutral between the two adjacent categories. And just as there is compression within each color range there is expansion between them: Equal-sized frequency differences look much smaller and are harder to detect when they are within one color category than when they cross the boundary from one category to the other.
Although basic color CP is inborn rather than a result of learning, it still meets our definition of categorization because the real-time trial and error process that "shaped" CP through error-corrective feedback from adaptive consequences was Darwinian evolution. Those of our ancestors who could make rapid, accurate distinctions based on color out-survived and out-reproduced those who could not. That natural selection served as the "error-correcting" feedbackon the genetic trial-and-error variation. There are probably more lessons to be learned, from the analogy between categories acquired through learning and through evolution as well as from the specific features of the mechanism underlying color CP -- but this brings us back to the "how" question raised earlier, to which we promised to return.
Machine learning algorithms from artificial intelligence research, genetic algorithms from artificial life research and connectionist algorithms from neural network research have all been providing candidate mechanisms for performing the "how" of categorization. There are in general two kinds of models, so-called "supervised" and "unsupervised" ones. The unsupervised models are generally designed on the assumption that the input "affordances" are already quite salient, so that the right categorization mechanism will be able to pick them up on the basis of the shape of the input from repeated exposure and internal analysis alone, with no need of external error-correcting feedback. By way of an exaggerated example, if the world of shapes consisted of nothing but boomerangs and Jerry Fodor shapes, an unsupervised learning mechanism could easily sort out their retinal shadows on the basis of their intrinsic structure alone (including their projective geometric invariants). But with the shadows of new-born chick abdomens, sorting them out as males and females would probably need the help of error-corrective feedback. Not only would the attempt to sort them on the basis of their intrinsic structural landscape alone be like looking for a needle in a haystack, but there is also the much more general problem that the very same things can often be categorized in many different ways. It would be impossible, without supervision, to determine which way was correct (in a given context, for the right categorization can vary with the context: sometimes we may want to sort baby chicks by gender, sometimes by species, or something else) (Harnad 1987).
In general, a nontrivial categorization problem will be "underdetermined." Even if there is only one correct solution, and even if it can be found by an unsupervised mechanism, it will first require a lot of exposure and processing. The figure/ground distinction might be something like this: How, in general, does our visual system manage to process the retinal shadows of real-world scenes in such a way as to sort out what is figure and what is ground? In the case of ambiguous figures such as Escher drawings there may be more than one way to do this, but in general, there is a default way to do it that works, and our visual systems usually manage to find it, quickly and reliably for most scenes. It is unlikely that they learned to do this on the basis of having had supervisory feedback on samples of all the possible combinations of scenes and their shadows.
There are both morphological and geometric invariants in the sensory shadows of objects, highlighted especially when we move relative to them or vice versa; these can be extracted by unsupervised learning mechanisms that sample the structure and the correlations (including covariance and invariance under dynamic sensorimotor transformations). Such mechanisms cluster things according to their internal similarities and dissimilarities, enhancing both the similarities and the contrasts. An example of an unsupervised contrast-enhancing and boundary-finding mechanism is "reciprocal inhibition," in which activity from one point in visual space inhibits activity from surrounding points and vice-versa. This kind of internal competition tends to bring into focus the structure inherent in and afforded by the input.
This kind of unsupervised clustering based on enhancing structural similarities and correlations will not work, however, if different ways of clustering the very same sensory shadows are correct, depending on other circumstances. To sort this out, supervision by error-corrective feedback is needed too; the sensorimotor structure and its affordances alone are not enough. We might say that supervised categories are even more underdetermined than unsupervised ones. Both kinds of category are underdetermined, because the sensory shadows of their members are made up of a high number of dimensions and features, their possible combinations yielding an infinity of potential shadows, making the subset of them that will afford correct categorization hard to find. But supervised categories have the further difficulty that there are many correct categorizations (sometimes an infinite number) for the very same set of shadows.
If you doubt this, open a dictionary again, pick any content word, say, "table," then think of an actual table, and think of all the other things you could have called it (thing, object, vegetable, handiwork, furniture, hardwood, Biedermeyer, even "Charlie"). The other names you could have given it correspond to other ways you could have categorized it. Every category has both an "extension" (the set of things that are members of that category) and an "intention" (the features that make things members of that category rather than another). Not only are all things the members of an infinite number of different categories, but each of their features, and combinations of features is a potential basis (affordance) for assigning them to still more categories. So far, this is again just ontology; but if we return to sensory inputs, and the problem facing the theorist trying to explain how sensorimotor systems can do what they do: sensory inputs are the shadows of a potentially infinite number of different kinds of things. Categorization is the problem of sorting them correctly, depending on the demands of the situation.
Supervised learning can help; if unsupervised learning
cannot find the winning features, perhaps feedback-guided trial and
error training will do it, as with the
pigeon's black/white sorting and the chicken-sexing. There are some
learning algorithms so powerful that they are guaranteed to find the
in the haystack, no matter how undetermined it is Ð as long as it is
just underdetermined, not indeterminate (like the exact midpoint
black and white) or NP-complete -- and as long is enough data and
and time (as, for the language-learning child, there is not, hence the
of the stimulus"). Our categorization algorithms have to be able to do
we can do; so if we can categorize a set of inputs correctly, then
inputs must not only have the features that can afford correct
but there must also be a way to find and use those affordances. (Figure
shows how a supervised neural net learns to sort a set of shapes into
categories by compressing and separating their internal representations
hidden-unit space; Tijsseling & Harnad 1997.) Figure 1. Left: 3 sets of stimuli presented to
net: vertical arm of L much longer, vertical and horizantal about
horizontal much longer. Middle: Position of the hidden-unit
of each of the three categories after auto-association but before
(cubes represent Ls with long vertical arms, pyramids Ls with
arms, spheres Ls with long horizontal arms). Right: Within-category
and between-category separation when the net has learned to separate
three kinds of input. (From Tijsseling & Harnad 1997.)
Figure 1. Left: 3 sets of stimuli presented to neural net: vertical arm of L much longer, vertical and horizantal about equal, horizontal much longer. Middle: Position of the hidden-unit representations of each of the three categories after auto-association but before learning (cubes represent Ls with long vertical arms, pyramids Ls with near-equal arms, spheres Ls with long horizontal arms). Right: Within-category compression and between-category separation when the net has learned to separate the three kinds of input. (From Tijsseling & Harnad 1997.)
Fodor and others have sometimes suggested otherwise: They have suggested that one of the reasons most categories can be neither learned nor evolved (and hence must be "innate" in some deeper sense than merely being a Darwinian adaptation) is the "vanishing intersections" problem: If you go back to the dictionary again, pick some content words, and then look for the "invariance" shared by all the sensory shadows of just about any of the things designated by those words, you will find there is none: their "intersection" is empty. What do all the shadows of boomerangs or tables Ð let alone Jerry Fodors or chicken-bootoms Ð have in common (even allowing dynamic sensorimotor interactions with them)? And if that doesn't convince you, then what is the sensory shadow of categories like "goodness," "truth," or "beauty"?
There is no reason for invariance theorists to back down from this challenge. First, it has to be pointed out that since we do manage to categorize correctly all those things designated by our dictionaries, there is indeed a capacity of ours that needs to be accounted for. To say that these categories are "innate" in a Cartesian, Platonic, or cosmogonic sense rather than just a Darwinian sense is simply to say that they are an unexplained, unexplainable mystery. So let us reject that. Let us assume that if organisms can categorize, then there must be a sensorimotor basis for that skill of theirs, and its source must be either evolution, learning, or both. Which means that there must be enough in those shadows to afford all of our categorization capacity. Does it all have to be a matter of direct sensorimotor invariants, always? No, but the path to goodness, truth and beauty requires us to trace the chain of abstraction that takes us from categories acquired through direct sensory experience to those acquired through linguistic "hearsay":
Consider the five sensorimotor ways we can interact differentially with things: the five kinds of things we can do with things. We can see them, recognize them, manipulate them, name them or describe them. "Manipulate" in a sense already covers all five, because manipulating is something we do with things; but let us reserve the word "manipulate" for our more direct physical interactions with objects, such as touching, lifting, pushing, building, destroying, eating, mating with, and fleeing from them. Naming them and describing them is also a thing we do with them, but let us not subsume those two acts under manipulation. Seeing and recognizing are likewise things we do with things, but these too are better treated separately, rather than as forms of manipulation. And "seeing" is meant to stand in for all modes of sensory contact with things (hearing, smelling, tasting, touching), not just vision.
Recognizing is special, because it is not just a passive sensory event. When we recognize something, we see it as a kind of thing (or an individual) that we have seen before. And it is a small step from recognizing a thing as a kind or an individual to giving it a name. Seeing requires sensorimotor equipment, but recognizing requires more. It requires the capacity to abstract. To abstract is to single out some subset of the sensory input, and ignore the rest. For example, we may see many flowers in a scene, but we must abstract to recognize some of them as being primroses. Of course, seeing them as flowers is itself abstraction. Even distinguishing figure from ground is abstraction. Is any sensorimotor event not abstraction?
To answer, we have to turn to fiction. Borges, in his 1944 short story, "Funes the Memorious," describes a person who cannot abstract. One day Funes fell off a horse, and from then onward he could no longer forget anything. He had an infinite rote memory. Every successive instant of his experience was stored forever; he could mentally replay the "tapes" of his daily experience afterwards, and it would take even longer to keep re-experiencing them than it had to experience them in the first place. His memory was so good that he gave proper names or descriptions to all the numbers -- "Luis Meli‡n Lafinur, Olimar, azufre, los bastos, la ballena, el gas, la caldera, NapolŽon, Agust’n de Ved’a" -- from 1 all the way up to enormous numbers. Each was a unique individual for him. But, as a consequence, he could not do arithmetic; could not even grasp the concepts. The same puzzlement accompanied his everyday perception. He could not understand why we people with ordinary, frail memories insisted on calling a particular dog, at a particular moment, in a particular place, in a particular position, by the same name that we call it at another moment, a different time, place, position. For Fines, every instant was infinitely unique, and different instants were incomparable, incommensurable.
Funes's infinite rote memory was hence a handicap, not an advantage. He was unable to forget, yet forgetting, or at least ignoring, is what is required in order to recognize and name things. Strictly speaking, a true Funes could not even exist, or if he did, he could only be a passive sensorimotor system, buffeted about by its surroundings (like the sand by the wind). Borges portrayed Funes as having difficulties in grasping abstractions, yet if he had really had the infinite memory and incapacity for selective forgetting that Borges ascribed to him, Funes should have been unable to speak at all, for our words all pick out categories bases on abstraction. He should not have been able to grasp the concept of a dog, let alone any particular dog, or anything else, whether an individual or a kind. He should have been unable to name numbers, even with proper names, for a numerosity (or a numeral shape) is itself an abstraction. There should be the same problem of recognizing either a numerosity or numeral as being the same numerosity (numeral) on another occasion as there was in recognizing a dog as the same dog, or as a dog at all.
Funes was a fiction, but Luria described a real person who had handicaps that went in the same direction, though not all the way to an infinite rote memory. In "The Mind of a Mnemonist" (1968) Luria describes a stage memory-artist, "S," whom he had noticed when S was a journalist because he never took notes. S did not have an infinite rote memory like Funes's, but a far more powerful and persistent rote memory than a normal person. When he performed as a memory artist he would memorize long strings of numbers heard only once, or all of the objects in the purse of an audience member. He could remember the exact details of scenes, or long sequences. He also had synaesthesia, which means that sensory events for him were richer, polysensory experiences: sounds and numbers had colors and smells; these would help him remember. But his powerful rote memory was a handicap too. He had trouble reading novels, because when a scene was described, he would visualize a corresponding scene he had once actually seen, and soon he was lost in reliving his vivid eidetic memory, unable to follow the content of the novel. And he had trouble with abstract concepts, such as numbers, or even ordinary generalizations that we all make with no difficulty.
What the stories of Funes and S show is that living in the world requires the capacity to detect recurrences, and that that in turn requires the capacity to forget or at least ignore what makes every instant infinitely unique, and hence incapable of exactly recurring. As noted earlier, Gibson's (1979) concept of an "affordance" captures the requisite capacity nicely: Objects afford certain sensorimotor interactions with them: A chair affords sitting-upon; flowers afford sorting by color, or by species. These affordances are all invariant features of the sensory input, or of the sensorimotor interaction with the input, and the organism has to be capable of detecting these invariants selectively -- of abstracting them. If all sensorimotor features are somehow on a par, and every variation is infinitely unique, then there can be no abstraction of the invariants that allow us to recognize sameness, or similarity, or identity, whether of kinds or of individuals.
Watanabe's (1985) "Ugly Duckling Theorem" captures the same insight. He describes how, considered only logically, there is no basis for saying that the "ugly duckling" -- the odd swanlet among the several ducklings in the Hans Christian Anderson fable -- can be said to be any less similar to any of the ducklings than the ducklings are to one another. The only reason it looks as if the ducklings are more similar to one another than to the swanlet is that our visual system "weights" certain features more heavily than others -- in other words, it is selective, it abstracts certain features as privileged. For if all features are given equal weight and there are, say, two ducklings and a swanlet, in the spatial position D1, S, D2, then although D1 and D2 do share the feature that they are both yellow, and S is not, it is equally true that D1 and S share the feature that they are both to the left of D2 spatially, a feature they do not share with D2. Watanabe pointed out that if we made a list of all the (physical and logical) features of D1, D2, and S, and we did not preferentially weight any of the features relative to the others, then S would share exactly as many features with D1 as D1 shared with D2 (and as D2 shared with S). This is an exact analogue of Borges's and Luria's memory effect, for the feature list is in fact infinite (it includes either/or features too, as well as negative ones, such as "not bigger than a breadbox," not double, not triple, etc.), so unless some features are arbitrarily selected and given extra weight, everything is equally (and infinitely) similar to everything else.
But of course our sensorimotor systems do not give equal weight to all features; they do not even detect all features. And among the features they do detect, some (such as shape and color) are more salient than others (such as spatial position and number of feathers). And not only are detected features finite and differentially weighted, but our memory for them is even more finite: We can see, while they are present, far more features than we can remember afterward.
The best illustration of this is the difference between relative and absolute discrimination that was pointed out by George Miller in his famous 1956 paper on our brains' information-processing limits: "The Magical Number 7+/-2". If you show someone an unfamiliar, random shape, and immediately afterward show either the same shape again or a slightly different shape, they will be able to tell you whether the two successive shapes were the same or different. That is a relative discrimination, based on a simultaneous or rapid successive pairwise comparison. But if instead one shows only one of the two shapes, in isolation, and asks which of the two it is, and if the difference between them is small enough, then the viewer will be unable to say which one it is. How small does the difference have to be? The "just-noticeable-difference" or JND is the smallest difference that we can detect in pairwise relative comparisons. But to identify a shape in isolation is to make an absolute discrimination (i.e., a categorization), and Miller showed that the limits on absolute discrimination were far narrower than those on relative discrimination.
Let us call relative discrimination "discrimination" and absolute discrimination "categorization." Differences have to be far greater for identifying what kind or individual something is than for telling it apart it from something else that is simultaneously present or viewed in rapid succession. Miller pointed out that if the differences are along only one sensory dimension, such as size, then the number of JNDs we can discriminate is very large, and the size of the JND is very small, and depends on the dimension in question. In contrast, the number of values along the dimension for which we can categorize the object in isolation is approximately seven. If we try to subdivide any dimension more finely than that, categroization errors grow.
This limit on categroization capacity has its counterpart in memory too: If we are given a string of digits to remember we -- unlike Luria's S, who can remember a very large number of them -- can recall only about 7. If the string is longer, errors and interference grow.
Is there any way to increase our capacity to make categorizations? One way is to add more dimensions of variation; presumably this is one of the ways in which S's synaesthesia helped him. But even higher dimensionality has its limits, and never approaches the resolution power of the JND of sensory discrimination. Another way of increasing memory is by recoding. Miller showed that if we have to remember a string of 0's and 1's, then a string of 7 items is about our limit. But if we first learn to recode the digits into, say, triplets in binary code, using their decimal names -- so that 001 is called "one", 010 is called "two," 011 is called "three" etc., and we overlearn that code, so that we can read the strings automatically in the new code, then we can remember three times as many of the digits. The 7-limit is still there, but it is now operating on the binary triplets into which we have recoded the digits: 101 is no longer three items: it is recoded into one "chunk," "five." We have learned to see the strings in terms of bigger chunks -- and it is these new chunks that are now subject to the 7-limit, not the single binary digits.
Recoding by overlearning bigger chunks is a way to enhance rote memory for sequences, but something similar operates at the level of features of objects: Although the number of features our sensory systems can detect in an object is not infinite, it is large enough so that if we see two different objects, sharing one or a few features, we will not necessarily be able to detect that they share features, hence that they are the same kind of object. (This is again a symptom of the "underdetermination" mentioned earlier, and is related to the so-called "credit assignment problem": How to find the winning feature or rule among many possibilities?) To be able to abstract the shared features, we need supervised categorization training, with trial and error and corrective feedback based on a large enough sample to allow our brains to solve the credit-assignment problem and abstract the invariants underlying the variation. The result, if the learning is successful, is that the inputs are recoded, just as they are in the digit string memorization; the features are re-weighted. The objects that are of the same kind, because they share invariant features, are consequently seen as more similar to one another; and objects of different kinds, not sharing the invariants, are seen as more different.
This within-category enhancement of perceived similarity and between-category enhancement of perceived differences is again the categorical perception (CP) described earlier in the case of color. The sensory "shadows" of light frequency, intensity and saturation were recoded and reweighted by our evolved color receptors so as to selectively detect and enhance the spectral ranges that we consequently see as red, yellow, etc.
CP is an effect of learning, it is a kind of a Whorfian effect. Whorf
(1956) suggested that how objects look to us depends on how we sort and
name them. He cited colors as an example of how language and culture
shape the way things look to us, but the evidence suggests that the
qualitative color-boundaries along the visible spectrum are a result of
inborn feature detectors rather than of learning to sort and name
colors in particular ways. Learned CP effects do occur, but they are
subtler than color CP, and can only be demonstrated in the
psychophysical laboratory (Goldstone 1994, 2001; Livingston et al.
1998). Figure 2 below illustrates this for a task in which
learned texture categorization. For an easy categorization task, there
no difference before and after learning, but for a hard one, learning
within-category compression and between-category separation. (From
& Harnad 1997).
Yet learned CP works much the way inborn CP does: Some features are selectively enhanced, others are suppressed, thereby bringing out the commonalities underlying categories or kinds. This works like a kind of input filter, siphoning out the categories on the basis of their invariant features, and ignoring or reducing the salience of non-invariant features. The supervised and unsupervised learning mechanisms discussed earlier have been proposed as the potential mechanisms for this abstracting capacity, with sensorimotor interactions also helping us to converge on the right affordances, resolving the underdetermination and solving the credit-assignment problem.
Where does this leave the concrete/abstract distinction and the vanishing-intersections problem, then? In what sense is a primrose concrete and a prime number abstract? And how is "roundness" more abstract than "round," and "property" more abstract still? Identifying any category is always based on abstraction, as the example of Funes shows us. To recognize a wall as a wall rather than, say, a floor, requires us to abstract some of its features, of which verticality, as opposed to horizontality, is a critical one here (and sensorimotor interactions and affordances obviously help narrow the options). But in the harder, more underdetermined cases like chicken-sexing, what determines which features are critical? (We are back to the Maine joke again: "How's your wife?" "Compared to what?")
Although categorization is an absolute judgment, in that it is based on identifying an object in isolation, it is relative in another sense: What invariant features need to be selectively abstracted depend entirely on what the alternatives are. "Compared to what?" The invariance is relative to the variance. Information, as we learn from formal information theory, is something that reduces the uncertainty among alternatives. So when we learn to categorize things, we are learning to sort the alternatives that might be confused with one another. Sorting walls from floors is rather trivial, because the affordance difference is so obvious already, but sorting the sex of newborn chicks is harder, and it is even rumoured that the invariant features are ineffable in that case: They cannot be described in words. That's why the only way to learn them is through the months or years of trial and error experience training guided by feedback under the supervision of masters.
But let us not mistake the fact that it is difficult to make them explicit verbally for the fact that there is anything invisible or mysterious about the features underlying chicken-sexing -- or any other subtle categorization. Biederman did a computer-analysis of newborn chick-abdomens and identified the winning invariants in terms of his "geon" features (Biederman & Shiffrar 1987). He was then able to teach the features and rules through explicit instruction to a sample of novices so that within a short time they were able to sex chicks at the brown-belt level, if not the black belt level. This progress should have taken them months of supervised trial-and-error training, according to the grandmasters.
So if we accept that all categorization, great and small, depends on abstracting some features and ignoring others, then all categories are abstract. Only Funes lives in the world of the concrete, and that is the world of mere passive experiential flow from one infinitely unique instant to the next (like the sand in the wind). For to do anything systematic or adaptive with the input would require abstraction, whether innate or learned: the detection of the recurrence of a thing of the same kind.
What about degrees of abstractness? (Having, with G.B. Shaw, identified the profession, we are now merely haggling about the price.) When I am sorting things as instances of a round-thing and a non-round-thing, I am sorting things. This thing is round, that thing is non-round. When I am sorting things as instances of roundness and non-roundness, I am sorting features of things. Or rather, the things I am sorting are features (also known as properties, when we are not just speaking about them in a sensorimotor sense). And features themselves are things too: roundness is a feature, an apple is not (although any thing, even an apple, can also be a part, hence a feature, of another thing).
In principle, all this sorting and naming could be applied directly to sensorimotor inputs; but much of the sorting and naming of what we consider more abstract things, such as numbers, is applied to symbols rather than to sensorimotor interactions with objects. I name or describe an object, and then I categorize it: "A number is an invariant numerosity" (ignoring the variation in the kinds or individuals involved). This simple proposition already illustrates the adaptive value of language: Language allows as to acquire new categories without having to go through the time-consuming and risky process of direct trial-and-error learning. Someone who already knows can just tell me the features of an X that will allow me to recognize it as an X. (This is rather like what Biederman did for his experimental subjects, in telling them what features to use to sex chickens, except that his method was hybrid: It was show-and-telling, not just telling, because he did not merely describe the critical features verbally, but also pointed them out and illustrated them visually. He did not pretrain his subjects on geon-naming, as Miller's subjects were pretrained on naming binary triplets.)
If Biederman had done it all with words, through pure hearsay, he would have demonstrated the full and unique category-conveying power of language: In sensorimotor learning, the abstraction usually occurs implicitly. The neural net in the learner's brain does all the hard work, and the learner is merely the beneficiary of the outcome. The evidence for this is that people who are perfectly capable of sorting and naming things correctly usually cannot tell you how they do it. They may try to tell you what features and rules they are using, but as often as not their explanation is incomplete, or even just plain wrong. This is what makes cognitive science a science; for if we could all make it explicit, merely by introspecting, how it is that we are able to do all that we can do, then our introspection would have done all of cognitive science's work for it. But we usually cannot make our implicit knowledge explicit, just as the master chicken-sexers could not. Yet what explicit knowledge we do have, we can convey to one another much more efficiently by hearsay than if we had to learn it all the hard way, through trial-and-error experience. This is what gave language the powerful adaptive advantage that it had for our species; Cangelosi & Harnad 2001; see Figure 3).
Figure 4. An artificial-life simulation of mushroom foragers. Mushroom-categories could be learned in two different ways, by sensorimotor “toil” (trial-and-error learning with feedback from the consequences of errors) or linguistic “theft” (learning from overhearing the category described; hearsay). Within a very few generations the linguistics “thieves” out-survive and out-reproduce the sensorimotor toilers. (But note that the linguistically based categories must be grounded in sensorimotor categories: it cannot be theft all the way down.) (Cangelosi & Harnad 2001.)
Where does this leave prime numbers then, relative to primroses? Pretty much on a par, really. I, for one, do not happen to know what primroses are. I am not even sure they are roses. But I am sure I could find out, either through direct trial and error experience, my guesses corrected by feedback from the masters, and my internal neural nets busily and implicitly solving the credit-assignment problem for me, converging eventually on the winning invariants; or, if the grandmasters are willing and able to make the invariants explicit for me in words, I could find out what primroses are through hearsay. It can't be hearsay all the way down, though. I will have had to learn some things the hard, sensorimotor way, if the words used by the grandmasters are to have any sense for me. The words would have to name categories I already have.
Is it any different with prime numbers? I know they are a kind of number. I will have to be told about factoring, and will probably have to try it out on some numbers to see what it affords, before recognizing that some kinds of numbers do afford factoring and others do not. The same is true for finding out what deductive proof affords, when they tell me more about further features of prime numbers. Numbers themselves I will have had to learn at first hand, supervised by feedback in absolutely discriminating numerosities, as provided by yellow-belt arithemeticians -- for here too it cannot be hearsay all the way down. (I will also need to experience counting at first hand, and especially what "adding one" to something, over and over again, affords.)
But is there any sense in which primroses or their features are "realer" than prime numbers and their features? Any more basis for doubting whether one is really "out there" than the other? The sense in which either of them is out there is that they are both absolute discriminables: Both have sensorimotor affordances that I can detect, either implicitly, through concrete trial-and-error experience, guided by corrective feedback (not necessarily from a live teacher, by the way: if, for example, primroses were edible, and all other flowers toxic, or prime numerosities were fungible, and all others worthless, feedback from the consequences of the sensorimotor interactions would be supervision enough); or explicitly, through verbal descriptions (as long as the words used are already grounded, directly or recursively, in concrete trial-and-error experience; Harnad 1990). The affordances are not imposed by me; they are "external" constraints, properties of the outside world, if you like, governing its sensorimotor interactions with me. And what I do know of the outside world is only through what it affords (to my senses, and to any sensory prostheses I can use to augment them). That 2+2 is 4 rather than 5 is hence as much of a sensorimotor constraint as that projections of nearer objects move faster along my retina than those of farther ones.
Mere cognitive scientists (sensorimotor roboticists, really) should not presume to do ontology at all, or should at least restrict their ontic claims to their own variables and terms of art -- in this case, sensorimotor systems and their inputs and outputs. By this token, whatever it is that "subtends" absolute discriminations -- whatever distal objects, events or states are the sources of the proximal projections on our sensory surfaces that afford us the capacity to see, recognize, manipulate, name and describe them -- are all on an ontological par; and subtler discriminations are unaffordable.
Where does this leave goodness, truth and beauty, and their sensorimotor invariants? Like prime numbers, these categories are acquired largely by hearsay. The ethicists, jurists and theologians (not to mention our parents) tell us explicitly what kinds of acts and people are good and what kind are not, and why (but the words in their explicit descriptions must themselves be grounded, either directly, or recursively, in sensorimotor invariants: again, categories cannot be hearsay all the way down.). We can also taste what's good and what's not good directly with our senses, of course, in sampling some of their consequences. We perhaps rely more on our own sensory tastes in the case of beauty, rather than on hearsay from aestheticians or critics, though we are no doubt influenced by them and by their theories too. The categories "true" and "false" we sample amply through direct sensory experience, but there too, how we cognize them is influenced by hearsay; and of course the formal theory of truth looks more and more like the theory of prime numbers, with both constrained by the affordances of formal consistency.
But, at bottom, all of our categories consist in ways we behave differently toward different kinds of things, whether it be the things we do or don't, eat, mate with, or flee from, or the things that we describe, through our language, as prime numbers, affordances, or absolute discriminables. And isn't that all that cognition is for -- and about?
Biederman, I. & Shiffrar, M. M. (1987) Sexing day-old
chicks: A case study and
expert systems analysis of a difficult perceptual-learning task. Journal of Experimental Psychology: Learning, Memory,
& Cognition 13: 640 - 645.
Borges, J.L. (1962) Funes el memorioso
Cangelosi, A. & Harnad, S. (2001) The Adaptive Advantage of Symbolic Theft Over Sensorimotor Toil: Grounding Language in Perceptual Categories. Evolution of Communication 4(1) 117-142
Chomsky, N. (1976) In Harnad, Stevan and Steklis, Horst D. and Lancaster, Jane B., Eds. Origins and Evolution of Language and Speech, page 58. Annals of the New York Academy of Sciences.
Fodor, J. A. (1975) The language of thought. New York: Thomas Y. Crowell
Fodor, J. A. (1981) RePresentations. Cambridge MA: MIT/Bradford.
Fodor, J. A. (1998). In critical condition: Polemical essays on cognitive science and the philosophy of mind. Cambridge, MA: MIT Press.
Gibson, J.J. (1979). The Ecological Approach to Visual Perception. Houghton Mifflin, Boston. (Currently published by Lawrence Eribaum, Hillsdale, NJ http://cognet.mit.edu/MITECS/Entry/gibson1
Goldstone, R.L., (1994) Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General 123: 178Ð200
Goldstone, R.L. (2001) The Sensitization and Differentiation of Dimensions During Category Learning. Journal of Experimental Psychology: General 130: 116-139
Harnad, S. (1987) Category Induction and Representation, In: Harnad, S. (ed.) (1987) Categorical Perception: The Groundwork of Cognition . New York: Cambridge University Press.
Harnad, S. (1990) The Symbol Grounding Problem. Physica D 42: 335-346.
Harnad, S. (2000) Minds, Machines, and Turing: The Indistinguishability of Indistinguishables. Journal of Logic, Language, and Information 9(4): 425-445. (special issue on "Alan Turing and Artificial Intelligence")
Harnad, S. (2001) No Easy Way Out. The Sciences 41(2) 36-42.
Harnad, S. (2003) Categorical Perception. Encyclopedia of Cognitive Science. Nature Publishing Group. Macmillan. http://www.ecs.soton.ac.uk/~harnad/Temp/catperc.html
Harnad, S. (2003) Symbol-Grounding Problem. Encylopedia of Cognitive Science. Nature Publishing Group. Macmillan. http://www.ecs.soton.ac.uk/~harnad/Temp/symgro.htm
Livingston, Kenneth and Andrews, Janet and Harnad, Stevan (1998) Categorical Perception Effects Induced by Category Learning. Journal of Experimental Psychology: Learning, Memory and Cognition 24(3):732-753 http://eprints.ecs.soton.ac.uk/archive/00006883/
Luria, A. R. (1968) The Mind of a Mnemonist. Harvard University Press http://qsilver.queensu.ca/~phil158a/memory/luria.htm
Miller, George (1956) The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review 63:81-97 http://cogprints.ecs.soton.ac.uk/archive/00000730/
Rosch, E. & Lloyd, B. B. (1978) Cognition and categorization. Hillsdale NJ: Erlbaum Associates
Steklis, Horst Dieter and Harnad, Stevan (1976) From hand to mouth: Some critical stages in the evolution of language, In: Origins and Evolution of Language and Speech (Harnad, Stevan, Steklis , Horst Dieter and Lancaster, Jane B., Eds.), 445-455. Annals of the New York Academy of Sciences 280. http://cogprints.soton.ac.uk/documents/disk0/00/00/08/66/index.html
Watanabe, S., (1985) "Theorem of the Ugly Duckling", Pattern Recognition: Human and Mechanical. Wiley
Whorf, B.L. (1956) Language, Thought and Reality. (J.B. Carroll, Ed.) Cambridge: MIT http://www.mtsu.edu/~dlavery/Whorf/blwquotes.html
Appendix 1. There is nothing wrong with the "classical theory" of categorization.
Eleanor Rosch has suggested that because we cannot state the basis on which we categorize, that basis must not exist (Rosch & Lloyd 1978). It follows that there is something wrong with the so-called "classical" theory of categorization, which is that we categorize on the basis of the features that are necessary and sufficient to afford categorization.
Not only do I think there's nothing the least bit wrong with that "classical theory," but I am pretty confident that there is no non-magic alternative to it. Rosch's alternative was to vacillate rather vaguely between the idea that we categorize on the basis or prototypes or on the basis of "family resemblances". Let's consider of these candidate mechanisms in turn:
To categorize on the basis of prototypes would be to identify a bird as a bird because it looks more like the template for a typical bird than the template for a typical fish. This would be fine if all, many, or most of the things we categorize indeed had templates, and our internal categorization mechanism could sort their sensory shadows by seeing which template they are closest to; in other words, it would be fine if such a mechanism could actually generate our categorization capacity. Unfortunately it cannot. Template-matching is not very successful among the many candidate machine-learning models, and one of the reasons is that it is simply not the case that everything is a member of every category, to different degrees. It is not true ontologically that a bird is a fish (or a table) to a certain degree; nor is it true functionally that sensory shadows of birds can be sorted on the basis of their degree of similarity to prototype birds, fish or tables. So prototype theory is a non-starter as a mechanism for our categorization capacity. It might explain our typicality judgments Ð is this a more typical bird than that -- but being able to make a typicality judgment presupposes being able to categorize; it does not explain it: Before I can say how typical a bird this is, I first need to identify it as a bird!
So if not prototypes, what about family-resemblances, then? What are family resemblances? They are merely a cluster of either/or features: This is an X, if it has feature A or B or not C. Either/or features (disjunctive invariants) are perfectly classical (so forget about thinking of family-resemblances as alternatives to classical theories of categorization). The problem is that saying that some features are either/or features leaves us no closer to answering "how" than we were before we were informed of this. Yes, some of the affordances of sensory shadows will be either/or features, but what we need to know is what mechanism will be able to find them!
The last Roschian legacy to category theory is the "basic object" level, vs. the superordinate or subordinate level. Here too it is difficult to see what, if anything, we have learned. If you point to an object, say, a table, and ask me what it is, chances are that I will say it's a table, rather than a Biedermeyer, or furniture, or "Charlie". So what? As mentioned earlier, there are many ways to categorize the same objects, depending on context. A context is simply a set of alternatives among which the object's name is meant to resolve the uncertainty (in perfectly classical information-theoretic terms). So when you point to a table and ask me what it is, I pick "table" as the uncertainty-resolver in the default context (I may imagine that the room contains one chair, one computer, one waste-basket and one table. If I imagine that it contains four tables, I might have to identify this one as the Biedermeyer; and if there are four Biedermeyers, I may have to hope you know I've dubbed this one "Charlie." So much for subordinates. The same path can be taken for superordinates. It all devolves on the old Maine joke, which comes close to revealing a profound truth about categories: "How's Your wife?" Reply: "Compared to what?" If we were just discussing the relative amount you should invest in furniture in your new apartment, as opposed to accessories, and you forgot what was in the adjacent room and asked what was in there (when there was just a table) I might reply furniture. If we were discussing ontology, I might say "vegetable" (as opposed to animal or mineral). Etc.
So citing the "basic object level" does not help; that's just what one arbitrarily assumes the default context of interconfusable alternatives to be, given no further information. The only sense in which "concrete" objects, directly accessible to our senses, are somehow more basic, insofar as categorization is concerned, than more "abstract" objects, such as goodness, truth or beauty is that sensorimotor categories must be grounded in sensory experience and that the content of that experience is fairly predictable from most members of our species.
Appendix Two. Associationism begs the question of categorization.
The problem of association is the problem of rote pairing: an object with an object, a name with a name, a name with an object. Categorization is the problem of recognizing and sorting objects as kinds based on finding the invariants underlying sensorimotor interactions with their shadows. Associationism had suggested that this was just a matter of learning to associate tokens (instances, shadows) of an object-type with tokens of its type-name -- as indeed it is, if only we can first figure out which object-tokens are tokens (shadows) of the same object-type! Which is in turn the problem of categorization. Associationism simply bypassed the real problem, and reduced learning to the trivial process of rote association, governed by how often two tokens co-occurred (plus an unexplicated influence of how "similar" they were to one another).
Some associative factors are used by contenporary unsupervised learning models, where internal co-occurrence frequencies and similarities are used to cluster inputs into kinds by following and enhancing the natural landscape of their similarities and dissimilarities. But this is internal association among representational elements and patterns (e.g., units in a neural network), not external association between input tokens. And its scope is limited, as we have seen, for most of the shadows of most of the members of most of the categories in our dictionary could not be sorted into their respective categories by unsupervised association alone, because of underdetermination. Nor is supervised learning merely rote association with the added associative cue of the category-name (as provided by the supervisory feedback). The hard work in these learning models is done by the algorithm that solves the credit-assignement problem by finding the winning invariance in the haystack -- and no model can do this at human categorization-capacity scale just yet. Critics of associationism, however, drew the incorrect conclusion that because (1) we don't know what the invariance is in most cases and (2) association is ill-equipped to find it in any case, it follows that there either is no invariance, or our brains must already know it innately in some mysterious way.