Category Induction and Representation

Category Induction and Representation Harnad, S. (1987) Category Induction and Representation. Chapter 18 of: Harnad, S. (ed.) (1987) Categorical Perception: The Groundwork of Cognition. New York: Cambridge University Press.

Category Induction and Representation

Stevan Harnad
Department of Psychology
University of Southampton
Highfield, Southampton SO17 1BJ
UNITED KINGDOM

18.1 Introduction

18.1.1 Philosophical Background. problem of how categories are learned and represented is not only central to contemporary cognitive science. It is also closely related to several enduring problems in the philosophy of knowledge and language. Seven of the most prominent of these problems are the following:

P (1) The problem of induction is the first and most important: How does one arrive at successful (i.e., correct, predictive) generalizations from finite samples of instances?

P (2) The problem of mapping words onto objects : What is the relation between our words and the things they describe? How is it established?

P (3) The problem of meaning holism : Is a word's meaning isolable from the meaning of the rest of our vocabulary, or can the revision of one meaning necessitate the revision of all meanings?

P (4) The problem of knowledge-by-acquaintance versus knowledge-by-description : What is the difference between what we know (or can know) from direct sensory experiences and what we know (or can know) from verbal descriptions?

P (5) The problem of elementary percepts : What are the units out of which our perception of objects and forms is built? Are there indivisible perceptual primitives?

P (6) The problem of atomic symbols : How do the individual words we use in our definitions and descriptions get their meanings? How do they break out of the circle of dependence on prior definitions and descriptions (and prior words, etc.)?

P (7) The problem of `universals' : What is the difference between an object (a particular) and a feature (a universal)? How are concrete vs. abstract properties represented?

Current research on categorization suggests a way in which some epistemic aspects of these seven problems (i.e., aspects concerned with what we can know and how) can be empirically tested and modeled. Other important aspects of the same problems, however, particularly ontic ones (concerning what there is, rather than what we can know) are not empirical and will be left untouched in this discussion:

In the case of induction (1), neither the problem of fallibility (or inductive risk, i.e., the everpresent logical possibility that today's provisional generalizations could fail to hold tomorrow) nor the problem of how to ground induction noninductively (it appears to be impossible to validate the inductive process itself in any way that does not itself already rely on induction) can be resolved by what will be discussed here.

Associated with the word/object problem (2) are a host of difficulties with the idea of reference that have been pointed out by Frege (see 1952 translation), Goodman (1959), Quine (1960), Putnam (1975) and others. For example, the meaning of a word cannot just be the object or object category to which it refers because, as Frege's morning-star/evening-star example shows, different words, with different meanings, can have the same referent (e.g., Venus), even unbeknownst to the speaker; and substituting one such word for another can change true statements into false ones. There are also problems about what the connection between a word and its referent is, and how it is established. Quine, for example, has suggested that there are difficulties associated with simply assigning a name to an object or object category because it is indeterminate what aspect or feature of the object the name is picking out. (Rabbit could refer to rabbits, undetatched rabbit parts, rabbit stages, etc; Goodman's grue could refer to green, or to green now but blue after some arbitrary future date.) The model to be presented here may cast a little light on such difficulties, but for the most part they too will be left unresolved.

The treatment of meaning holism and concept revision (3) will also be only epistemic, that is, it will only be concerned with our concepts of what there is in the world, how we get those concepts, and how we revise them as our data-base grows, rather than with the question of what there really is in the world, and how we know it (Harnad, in preparation, a).\**

[footnote start] For example, regarding the successive revisions of the meaning of the word water (as we learn through science that it is really hydrogen and oxygen, and as the meanings of hydrogen and oxygen continue to evolve with the theories and discoveries of subatomic physics) this chapter will be concerned only with the revision of our provisional concepts of the category water on the available evidence, not with what water really is. [footnote end]

Some aspects of the acquaintance/description dichotomy (4) (again, only the epistemic ones) may also be clarified by the model, but the irreducibility of experience itself (Nagel 1974, 1986) will remain as untouched as the nature of consciousness. The model's application to the elementary-percepts (5) and atomic-symbols (6) problem will also bypass the questions about irreducibility that some philosophers consider to be the fundamental ones. And of course no real ontic issues will be resolved by this treatment of our concepts of universals, i.e., our concepts of abstract objects and properties (7).

For each aspect of these longstanding problems that will be left untouched, however, there will be other aspects that can be viewed in a new way as a result of this discussion, allowing some of the attendant questions to be posed in an empirically decidable form. For example, with regard to induction, there will be a description of the kind of mechanism that would be needed to learn the regularities underlying samples of inputs that allow them to be sorted and named correctly. A three-level representational system will mediate between objects and the words we use to name and describe them. The revision of these representations will not be local but holistic, sometimes affecting the entire representational system because of the bottom-up way in which the system is grounded. The acquaintance/description dichotomy will find its counterpart in the hybrid system of representation itself: A two-level acquaintance system will serve as the locus of elementary percepts and the grounding for the atomic symbols of a third, descriptive system. The object/feature dichotomy will also find a natural counterpart in the proposed dual acquaintance system and its relation to the descriptive system. (All this should become clearer as this chapter progresses.)

This synthesis will be undertaken within the general framework of a theory of categorization, i.e., a theory of how we sort and name objects and states of affairs. To close this introductory section, here are the seven questions listed above reformulated as problems for a theory of categorization:

The problem of induction (1) is the problem of extracting reliable categories from experience, that is: On the basis of a finite sample of members and nonmembers of a set of underlying categories, together with feedback as to which are members and which are not, how do we successfully detect and encode the regularities or invariances that allow us to categorize further instances correctly? The word/object problem (2) is the problem of matching the category representation to the instances encountered so as to yield correct categorization. Direct sensory experience will be a prerequisite for forming the perceptual categories in the dual acquaintance system, whereas verbal descriptions will be the chief source of the symbolic categories of the descriptive system (4). The word/object connection will be grounded in a specific interaction among all three representational systems, with the names of the perceptual categories (5) serving as the atomic symbols for the verbal categories (6). The universals (properties) (7) for which we have concepts will be the invariant features underlying categorization that have themselves become categories. Last and most important, the correctness of categorization will turn out to be an approximate rather than an exact matter, depending on the range and the interconfusability of the instances the categorizer has sampled to date. This approximateness of categories, however, especially verbal ones, will be seen to be an advantage rather than a handicap, giving the verbal system the potential for the universal expressive power of natural language and guaranteeing (insofar as a guarantee is possible, given inductive risk) that meaning revision, though holistic and approximate (3), will always converge.

18.2 Approximationism

18.2.1 Approximation and context-driven convergence. Approximation is a key feature of the present approach. If it were to classified as an ism the way its philosophical counterparts have been, the best descriptor for the approach would be approximationism or convergentism. All of our categories turn out to be approximate rather than exact (in a realist's sense of exact). We converge on these approximations by accumulating input data, arriving at a provisional way of sorting them into categries, and then continually updating those categories in accordance with the constraints and contingencies of subsequent input so as to yield an approximate match that is adequate for the sample of categorization problems we have (successfully) faced to date, with prior categories always subsumed as a special case (except in the rare instances when they turn out to be empty, incoherent or unlearnable). It is not clear whether this inductive process is best viewed as optimization or as satisficing (i.e., provisionally making do, Simon 1957), but what seems undeniable is that category learning is an approximate process.\**

[footnote start] For example, the surviving aspects of the old problems of induction (1) -- future-contingent risk and the ungroundedness of induction -- guarantee that our categories will always be approximate in those respects: Our inductive future could always diverge. [footnote end]

Here are two examples of the sense in which categories are approximate. The first example is abstract, the second more concrete: Consider a simple problem in machine vision: Suppose all that a visual scene categorizer had to do was to tell apart trees from animals, that is, to categorize all instances of trees as trees and all instances of animals as animals. Suppose, by way of further simplification, that trees and animals were the only patterns the analyzer ever encountered, and suppose (to simplify still further) that its input patterns were already suitably smoothed and parsed so that they only appeared in standard positions, with figure and ground, parts and whole, already appropriately sorted.\**

[footnote start] All of these preconditions would of course have to be based recursively on the successful solution of prior categorization problems of the same kind. [footnote end]

Now it is evident that if there were nothing to worry about but sorting trees and animals under these canonical conditions, a very simple rule would work quite well, for example, counting the number of legs, L, and calling an instance a tree if L was less than or equal to 1 and an animal otherwise. Obviously such a rule could only work with well smoothed and parsed inputs, free of anomalous instances (such as storks standing on one leg, trees with split trunks, or tables). As an approximation, however, the rule would successfully sort the standard cases described. As anomalies were introduced, the rule could be revised and elaborated so as to tighten the approximation in accordance with the new contingencies. It is also apparent (if this view of categorization is a representative one) that better and better approximations are all one can ever expect under these conditions. No essence of a tree or of an animal could ever be captured by a process such as this. All that one could hope for would be the extraction and encoding of those invariant properties that will reliably subserve the categorizations one must make.

A more concrete example of how categorization is approximate would be the sorting of (edible) mushrooms from (poisonous) toadstools (Harnad 1987XX). Russians pick wild mushrooms on a much wider scale than Americans. Through collective and cumulative trial and error, they have managed to find the features that will allow them to safely sort mushrooms from toadstools in the sample they have encountered in their native land.

For example, in one particular case, the mushroom and toadstool look virtually identical, but the toadstool has spots and the mushroom does not; absence of spots is a reliable feature for recognizing the candidate as a mushroom. Knowing what they know, Russians rightly consider themselves to have a good idea of what a mushroom (vs. a toadstool) is. However, when they emigrate to the United States, these experienced Russian mushroom pickers find out that there are subtle differences in our local varieties mycoflora. The differences are not radical. The mushrooms and toadstools are recognizably familiar. However, their interconfusabilities have changed. For example, there is a variety of toadstool whose spots can wash of in the rain. Hence a categorization that was safe, correct and reliable in the USSR -- That's a mushroom -- turns out to be wrong and deadly in the USA for exactly the same (visual) input .

The only thing that has changed is what is confusable with what in the new sample, and which features will sort them out reliably once again. This is exactly analogous to the earlier example of the animal/tree sorter when it encountered an anomolous instance (e.g., the stork standing on one leg). Like the categorizing machine, the Russian emigr\*'e has no choice but to revise his sorting rules, now taking into account weather conditions and perhaps subtler features distinguishing the mushroom from the toadstool -- features that were not noticed or needed in the Soviet Union because they did not signal critical, confusion-resolving differences.

Yet even now, one can ask whether the Americanized mushroom-sorter has detected the essence of what a mushroom is: It seems instead that even now all he has is a better approximation -- one that has converged on features that are good enough to correctly sort the potentially confusable alternatives he has encountered so far (but who knows where he may have to emigrate tomorrow?).

18.2.2 A picture versus a thousand words. Approximation is what is at issue in the observation that a picture is worth a thousand words. In fact, in a formal sense (one that is in most cases trivial or inconsequential, yet true), a picture is always worth more than an infinite number of words, the reason being that a verbal description will always fall short of saying everything that can be said about the object it describes. Words obviously fall short when they leave out some critical feature that would be necessary in order to sort some future or potential anomalous instance; but even if one supposes that every critical feature anyone would ever care to mention has been mentioned, a description will always remain essentially incomplete in the following ways: P (a)

A description cannot convey the qualitative nature of the object being described (i.e., it cannot yield knowledge by acquaintance), although it can converge on it as closely as the describer's descriptive resources and resourcefulness allow. (Critical here will be the prior repertoire of direct experiences and atomic labels on which the description can draw.) P (b)

There will always remain inherent features of the object that will require further discourse to point out; an example would be a scene that one had neglected to mention was composed of a prime number of distinct colors. P (c)

In the same vein, there would be all the known and yet-to-be-discovered properties of prime numbers that one could speak of -- all of them entailed by the properties of the picture, all of them candidates (albeit far-fetched ones) for further discourse about the picture.\**

[footnote start] The inexhaustibility of their list of potential entailments is of course not unique to pictures; it is shared by propositions (e.g., axioms) and is in fact mediated by propositions about pictures, i.e., descriptions. But pictures (and objects) differ from propositions in that they inherently contain (in analog form) the information to answer (in the particular) an infinity of potential questions that descriptions can raise. (I ignore here the sentence/proposition distinction.) In my view, so-called propositional models of cognition (Pylyshyn 1980, 1984; Fodor 1975, 1980) mistakenly put an unsupportable burden on descriptive representations in failing to distinguish between, on the one hand, what is and is not encoded propositionally (presumably neither of these classes is empty in practise) and, on the other hand, what can be encoded propositionally (answer: anything -- to an approximation -- in principle, but not necessarily or optimally so in practise). And to the extent that descriptions are doomed to be approximate, they of course cannot encode everything. [footnote end]

P (d) Last, and most revealing, there are the inexhaustible shortcomings of words exemplified by all the iterative afterthoughts made possible by, say, negation: for example, the number of limbs is not two (three, four, ... etc.). The truth of all these potential descriptions is inherent in the picture, yet it is obvious that no exhaustive description would be possible. Hence all descriptions will only approximate a true, complete description.

3.2.3 The context of alternatives and the context-dependence of categorization.

There will be a nonstandard (but, I hope, well-motivated) use of the word context in this chapter. It applies equally to the tree/animal and mushroom/toadstool categorization problems mentioned earlier and to the (ostensibly trivial) negative/iterative case (d) just described. Every category has (at least) one context associated with it, namely, the relevant set of confusable alternatives amongst which the categorization is to be made. The context consists of the sampled members of the category itself plus the sampled members of its complement of relevant alternative categories. The context (i.e., the positive category plus its complement) is itself usually an actual or potential higher-order category. (The simplest context is a dichotomy: One positive category and one negative category.) For example, the context of the tree/animal categorization described earlier consisted of highly smoothed and parsed instances of trees and animals (in two-dimensional projection). In this context, a limb-count and comparison with the number 1 is sufficient to give rise to successful categorization. With the introduction of sleeping storks, bifurcated tree-trunks or tables, the context is widened and the approximation must be tightened so as to take into account more kinds of cases.\**

[footnote start] Since the constraints and contingencies of categorization problems are in principle open, it is not out of the question that successful performance should happen to depend on (say) the limb-count's not being equal to a prime number, or to 227, or what have you. [footnote end]

The same is true of the mushroom/toadstool problem, in the Russian versus the American context.

The point is that successful categorization depends on finding the critical features on the basis of which reliable, correct performance can occur. These will depend, not on the inherent features of any particular instance (there are an infinity of them) but on the context : the range of confusable alternatives involved, the specific contrasts that need to be made, the invariant features that will reliably subserve successful categorization. And because ranges can change (and instantiation and categorization are never-ending processes) all categories and the features on which they are based will always remain provisional and approximate.

18.2.4 Convergence, category revision and holism. At this point the following generalizations can be made about approximation and convergence: For the successful convergence of a category-induction problem to be possible (i.e., for the attainment of correct, reliable, all-or-none sorting performance), the instances must contain an invariant basis for correct sorting. (i) If they do not have an invariance then the problem is insoluble, indeterminate or ill-defined. (ii) Even given that there is invariance inherent in the instances, there is still the question of whether or not the induction can find it. For example, if the only means of finding the invariance is random trial-and-error, and time or capacity constraints make a combinatorial search unrealistic, convergence will again fail to occur. (iii) If, however, the invariance can be converged upon after a finite, reasonably small sample of instances has been sampled -- either because of prior simplifying constraints on the search or because of successful prior approximations and analogies -- then the categorization can converge.

Note, however, that even here convergence will be context-dependent, provisional and approximate because (a) future instances could again diverge (a widening of the context) requiring a new or revised solution (a tightening of the approximation) and because of (b) factors of underdetermination and parsimony (to be discussed below). It can be shown information-theoretically (Dretske 1983, Garner 1974, Harnad, in preparation a, Olson 1970, Sayre 1986) that the context of confusable alternatives determines the minimum quantity of information that will be required to discriminate reliably among the categories into which they are to be sorted (and complexity theory [Chaitin 1975] suggests methodological constraints for preferring a minimum).\**

[footnote start] Approximationism, context-dependence and the fact that contexts themselves are neither modular (i.e., independent and isolable from one another pace Fodor 1985) nor incommensurable ( pace Kuhn 1970) jointly entail meaning holism (Quine 1953), that is, that a change in the meaning of one symbol -- a change in the invariants determining that category, because the context of confusable alternatives has been widened -- can in principle change the meaning of all other symbols. [footnote end]

18.2.5 Underdetermination of category invariance. Another important example of underdetermination comes from a problem in the philosophy of science: the underdetermination of theories by data. It is always true in a trivial sense and often true in a substantive and practical sense that data are compatible with more than one theory that explains them (in fact, with an infinite number of theories). In practice, there may be two or more rival theories attempting to account for the same data. These theories will differ in their generality and their parsimony (some will account for more data, some will do so with fewer parameters) as well as in their predictions. Where predictions differ, testability presumably prevails, normal science occurs, and theories can be chosen and revised on the basis of subsequent evidence. It is obvious that this too is an approximate process; existing and future data provide the context, as governed by the constraints and contingencies of the real world (as well as the limitations of our existing categories, our imaginations, and our ability or luck in extracting the features that work).\**

[footnote start] Minsky's (1961) credit assignment problem -- the problem of determining to which candidate features (from among an infinity of candidates) to assign credit in revising a pattern recognition model so as to handle new, anomalous patterns that cannot be successfully sorted using the current feature-set -- is just a special case of underdetermination, and hence cannot be expected to have a general, principled solution any more than the problem of finding the right scientific theory to account for empirical data (or the general problem of induction) can. The constraint that the context of a category must be bounded and complemented (i.e., that the complement of a category cannot be either empty or everything there is) may play a role in ensuring that category induction can converge despite underdetermination (see Harnad 1982b). Approximateness itself plays a similar role in facilitating convergence. [footnote end]

The theories can be viewed literally as descriptions of the invariants underlying the data -- as making explicit the sorting rules for correctly classifying the events, objects and properties of the world, present and future -- in other words, as attempted solutions to a categorization problem. Again, correct is problematic because the categorization is clearly provisional, not obviously optimal rather than satisficing, and, as ever, approximate. Underdetermination implies approximationism, and it is hence likely that what can be discovered about the cognitive processes underlying the formation of perceptual and verbal categories will apply to scientific theory-construction too.

18.2.6 Overdetermination, Occam and optimization. The problem of underdetermination also raises the question of overdetermination (imparsimony): It is certainly true that neither with respect to the truth of scientific theories nor with respect to the validity and veridicality of perceptual and verbal categories is there any a priori reason for Occam's razor to prevail. God never promised a parsimonious universe, and there is no logical reason that an N-parameter account should be truer than an N+1-parameter account of the same data. Similarly, there is no logical reason why, in our tree/animal categorization problem, the algorithm (the features and rules) should have been the simplest one, namely, the comparison of the limb count with unity. First of all, the most parsimonious rule could be inefficient: It could take too long to implement; or it could simply be the case that the mechanism one is attempting to model happens to use a more redundant rule, perhaps for reliability, robustness or speed (say, limb count plus a calculation of the ratio of the length of the perimeter to the area or the principal axis, yielding a second parameter).

Redundancy and efficiency are mainly optimality factors. In principle, minimal rules must be preferred if an approximate process is to converge rather than go on for ever (for example, by testing far-fetched, higher-order number-theoretic properties). But this is a methodological principle, not a logical one, and may be violated in practice. It is a logical matter, however, to note that the decidability of a question of parsimony can only rest on the availability of discriminating data. If there is no way ever to know whether a mechanism (or a universe) is better described by an N-parameter account or by an N+1-parameter account then the matter is probably better consigned to the domain of a priori questions that this discussion cannot answer.

For our purposes, as long as there is a potential widening of the empirical context that can decide between a more and a less parsimonious algorithm (say, by showing that one of them sorts more instances or does it better in some sense) then an empirical choice can be made. Otherwise, the theory that over-determines the data is to be provisionally rejected. Underdetermination of theories by data is a complicated enough fact of life without compounding it by overdetermining the data through an arbitrary, redundant proliferation of parameters.

3.2.7 Analog representations, analog/digital conversion and digital/digital transformations.

This entire chapter could easily be devoted to the analog/digital distinction (Goodman, 1968; Lewis 1971; Harnad, 1982b; Pylyshyn 1984; Haugeland 1985; and Chapter 5), however, only a few points immediately relevant to the question of approximationism will be made here. In important respects, the formation of perceptual categories is a process of analog-to-digital (A/D) conversion.\**

[footnote start] For now I will conform to the provisional definition that a physical transformation from an object X to another object Y is analog to the degree that the causal process P that generates the physical properties of Y from the physical properties of X (i.e., causally connects them) is physically invertible . In practice, this will mean that P is formally describable as a continuous, invertible function mapping isomorphic properties of X onto Y (although the properties themselves need not be continuous). By contrast, a transformation from X to Y is digital to the degree that it is not a physically invertible one between properties, but a formal one, depending on conventional rules for manipulating arbitrary symbol tokens in Y that can be interpreted as standing for properties of X. A/D conversion would be a two-stage process according to this definition, involving (1) discretization and dimensional reduction (which still preserve some physically invertible structure) and (2) symbolization (in which the vestiges of physical shape are transformed into formal code consisting of arbitrary but semantically interpretable token systems).

Note that this definition is nonstandard and controversial. Not only does it define analog and digital (a) as matters of degree and (b) as dependent on an invertible physical transformation or causal connection, but it even implies that (c) a dedicated digital computer -- one that is hard-wired to its transducer inputs and its effector outputs -- is for that reason analog to a degree (namely, the degree to which the hard-wired physical processes are invertible). The hypothesis that we are such hybrid dedicated systems, and that the hard-wiring to our afferent and efferent systems of any internal symbolic modules we may have serves to fix their symbolic interpretations -- setting their encryption/decryption relations physically -- is closely related to the grounding arguments that will be made later in this chapter and will be further elaborated in Chapter 5. [footnote end]

In such a process some information is always lost (A/D transformations are approximations).

For example, the readout of any digital watch, no matter how precise, will always quantize time approximately, be it to the closest second, millisecond, microsecond or what have you. The quantization is the digitization: a minimal atomic unit is selected that sets the grain or level of resolution of the system. Note that grain cuts both ways, however. After all, even an analog watch has limits on its resolving capacity, and its crystal pacemaker's oscillations (inasmuch as they are countable) are digital. None of this requires a foray into quantum physics or basic concerns about whether physical processes are ultimately continuous or discrete. Biology and cognition occur at levels where the granularity is (roughly speaking) known and not really at issue. There is no real continuity in the nervous system, only continuity simulated by the deliberate blurring of grain or by insufficient resolving power to make discrete discriminations in certain regions. Hence it must be immediately pointed out that even the continuous representations that we will be assigning to the analog or iconic component of our dual acquaintance system will only have simulated continuity (or pseudocontinuity) rather than real continuity. In this sense, a real object will always be worth a thousand times (or even an infinite amount) more than a retinal image or a mental one. That is, even what we are calling an analog representation will be an approximation relative to the object it represents. This kind of approximation, however, is rather different from the kind of approximation involved in verbal description, and it is in some ways much less interesting, for it is still physical shape-preserving rather than symbolic; nevertheless, it will be shown to mediate verbal representations in an important way.

For now, let us note only that we are dealing with at least two orders of approximation: The first is the approximation involved in whatever structural information is lost in an analog transformation (because of grain, resolving capacity and any other dimension or range of variation not faithfully preserved in the image). This itself involves some approximation (information loss): the object-to-icon (O/I) transformation. Next there is the further approximation involved in selectively extracting invariant features (and discarding other variation that is not invariant across instances) in the service of category formation: this is the icon-to-atomic-category (I/C) conversion.\**

[footnote start] Note that this is all still analog, for it continues to preserve some of the physical shape of the input, albeit in a highly reduced and abstracted form. I/C conversion is really a transformation from macro-icons to micro-icons, the former preserving context-independent, holistic, configural properties of the proximal stimulus, the latter selectively reduced to only those properties that are invariant in a specific context of confusable alternatives. [footnote end]

Finally, there is the atomic-category-to-verbal-category and the verbal-to-verbal category conversion involved in symbolic description: These will be called the categorical-to-symbolic (C/S) and symbolic-to-symbolic (S/S) transformations. It is decidedly an understatement to call the last of these, the remarkable natural-language phenomenon of translatability (Steklis & Harnad 1976), a D/D conversion. But what the three prior stages of A/D conversion (O/I, I/C, C/S) and the fourth stage of D/D (S/S) do illustrate is the extensive degree to which category formation is indeed a process of approximation.

18.3 Categorical perception

18.3.1 Discrimination, identification and categorical perception. This extended discussion of approximationism has prepared the way for a description of the kinds of data and findings that have motivated the model to be proposed here. First, there is the psychophysical evidence concerning our relative discrimination performance: the kinds of stimuli we can tell apart (when they are presented simultaneously or in immediate succession) by making a same/different judgment, by stimulus-stimulus matching (picking which of several stimuli is most similar to the target stimulus) or by stimulus-response matching (copying, mimicking or some other analog response). Same/different judgments can provide evidence about the size of the jnds (just noticeable differences) between stimuli and stimulus-stimulus matching data can be used to derive multidimensional measures of similarity or proximity between stimuli. What must be emphasized in the case of the relative discrimination data is that they usually depend on either the simultaneous presence of the stimuli being compared or their successive presentation within a short enough time interval to allow brief iconic images to mediate the comparison. The discrimination is relative because it involves pairwise comparison, rather than being an absolute judgment about a stimulus in isolation.

Absolute discrimination , on the other hand, requires an autonomous judgment about the stimulus being presented, not a comparison with an accompanying stimulus. Absolute judgment calls for a unique discriminating response, which can either be a specially trained operant response or, more generally, a verbal label. In this chapter absolute discrimination will henceforth be referred to as identification and relative discrimination simply as discrimination. As pointed out elsewhere (Harnad 1982b), absolute and relative are in any case partially misleading descriptors, because absolute discriminations are always made relative to an implied context of alternatives (i.e., they are context-dependent in the technical sense described earlier), whereas relative discriminations (and their underlying representations) are comparatively context-independent, and in that sense more absolute.

For the purposes of categorization theory, the last three decades of research on discrimination and identification can be grounded in George Miller's synthesis in his celebrated 7+-2 paper (Miller 1956, Broadbent 1975). Miller noted that whereas discriminability seems to vary with the sensory modality and dimension involved, identifiability seems to have modality-independent constraints that are governed by learning, encoding (i.e., representation) and memory. How many stimuli (and which ones) you can discriminate will depend on the sensory dimension in question, but how many you can identify (and which) depends on what the confusable alternatives are (i.e., the informational context) and how you encode them: Discrimination is modality-dependent and identification is representation-dependent.

Even before Miller's observations, however, there had been indications that there might be interactions between identification and discrimination, or at any rate that discrimination too could be affected by experience and learning (Lawrence 1950; Bruner et al. 1956 ). In particular, it was noted that mere exposure (E. J. Gibson 1969) could sharpen discriminability and that certain kinds of differential experience could enhance some perceived differences and diminish others. But if discriminability was modifiable then perhaps psychophysical assumptions about stable, isotropic jnd continua were too rigid. Stability and isotropy were challenged separately. Helson (1964) showed how discrimination could vary with adaptation level (see Chapter 2, section 2.5.5) and research on color perception (sections 2.5.5 and 2.6.3) and phoneme perception (sections 2.2 - 2.4) showed that uniform physical continua were not necessarily uniform or continuous perceptually.

18.3.2 Phoneme perception and the motor theory. Apart from questions about plasticity and anisotropy in sensory continua, the role of experience and learning -- and especially that of the motor system and language -- in our discriminative performance became an object of attention and speculation in perceptual theory. As described in the preceding chapter, the finding in phoneme perception had been that discriminability was greater across the boundaries between two different phoneme categories (e.g., /ba/ and /da/) than it was within the categories, even when physically equal stimulus differences were involved. Now since phonemes were assumed to be acquired, man-made categories, it was hypothesized that language somehow mediated these dramatic boundary effects (which are subjectively perceived as not merely quantitative but qualitative). In particular, according to the motor theory of speech perception (Liberman, Harris, Hoffman & Griffith 1957), the reason that /ba/ and /da/ were categorically (i.e., discontinuously) discriminated was that their perception was mediated by a motor template derived from the way the sounds had to be articulated in order to be spoken: The perceptual discontinuity was mediated by the motor discontinuity.

The hypothesis of an analog motor template depended on a number of assumptions that did not remain unchallenged for long. One was the speech is special assumption, according to which perceptual discontinuities of this sort (henceforth referred to as Categorical Perception or CP) should be unique to speech. An early critique of the motor theory by Lane (1965), however, showed that CP-like effects could occur in other modalities. Another assumption of the motor theory had been that phoneme category boundaries are acquired in learning how to speak, but it was later found that prelinguistic infants exhibit CP effects in phoneme discrimination (see section 2.3.4). It had also been assumed that phoneme CP was unique to human beings, but it was subsequently observed in nonhuman species (with no apparent human-like vocalizations to mediate them -- see section 2.5).

Some attempts were made to patch up the motor theory and to hold onto the speech-is-special hypothesis by recasting it in evolutionary form, suggesting that there had been natural selection for enhanced discrimination of certain categories of audition because of its congruity with vocal categories (Liberman 1976, 1982). In addition, it was conceded that these categories may have been even more favored because they had been prepared by chance congruence with phylogenetically older auditory discontinuities (of unknown functional significance).

This uneasy synthesis is more or less the current status of the motor theory. The model to be proposed in this chapter will attempt to pick out the salient features of the motor theory and the speech is special theory that might be worth preserving; but first let us look briefly at the parallel developments in color CP.

3.3.3 Color perception and the Whorf Hypothesis.

Influenced by the Whorf Hypothesis (Whorf (1964) that language somehow determines our view of reality, Berlin & Kay (1969) conducted extensive cross-cultural investigations to show that color boundary effects are governed by vocabulary and that the categories of colors we can discriminate depend on the colors we name. The findings indicated that the phenomenon is ambiguous. There is some effect of color-names on color boundaries, but there also appear to be universal perceptual constraints at work, underlying not only the boundary effects but even the degrees of freedom that different languages have in their color naming. The current view (section 2.5.1) is that color categories are largely governed by innate, species-specific color-detecting mechanisms, but that there is also some color-boundary plasticity that can be modulated by experience and naming, especially in the young.

18.4 Unanswered questions about categorical perception

On the basis of the color and phoneme CP data as well as the various critiques and elaborations of the motor theory and the Whorf hypothesis that have appeared to date, a number of prominent unanswered questions about the generality and particulars of the CP phenomenon suggest themselves. Many of these questions have already been raised in Chapter 2. By way of motivating the representational model to be proposed in this chapter, I will now summarize the most prominent of these unanswered questions here, together with a number of provisional replies and hypotheses that point toward a coherent framework for unified CP research in the more general context of category formation: P (i)

What is the relation of the Whorf Hypothesis to CP?

The Whorf Hypothesis (2.5.1, 3.3.3) originally concerned a putative influence of syntactic categories on perception -- it was claimed that the Hopi, lacking a future tense in their language, had no concept of the future (Whorf 1964) -- but the idea has always been vague and difficult to test (cf. Bloom 1981; Liu 1985). A CP version of the Whorf Hypothesis could be formulated as follows: The perceptual and conceptual discriminations we make are governed by the categories we name and by our representations of the invariant features underlying the categorization. To the extent that discriminability could be shown experimentally to be influenced by learned names and descriptions, this version of the Whorf Hypothesis would be empirically supported by CP. P (ii)

Is CP uniquely related to or dependent on language, or are there instances of nonlinguistic CP? If CP is defined as enhanced discriminability between categories and enhanced similarity within categories (relative to some objective, category-independent metric) then it is not unique to language or uniquely dependent on language (2.5). However, language appears to be unique in its potential to mediate CP because it provides the labels and the descriptions that subtend most of our categories. P (iii)

Is speech special, and if so, is its special status connected with CP?

Speech is special in the following respects: It is the chief human medium of communication: linguistic communication. Speech-sound categories (phonemes) are special in that they have motor analogues -- the auditory stimuli can be not only perceived but also produced by the perceiver. This perception/production congruence is also not unique to speech (gesture, for example, shares the same motor analog property [Steklis & Harnad 1976], as do facial expression, singing, dancing, etc.). Speech-sound categories exhibit CP (phoneme boundaries), but because of the special perception/production congruence in speech, it is not yet clear whether this is a representative or an anomalous form of CP. P (iv)

Is CP uniquely related to or dependent on communication, or are there noncommunicational instances of CP?

Phoneme CP is communication-related. So are most of the existing examples of CP-like effects in animals, which tend to involve species-specific signaling systems (2.5.2). Color CP, however, is not obviously communicational, although it may be influenced by language. It is not clear yet whether just any arbitrary operant response can label categories and generate boundaries or whether language and other communicational systems and contexts are especially involved. P (v)

Is CP uniquely related to or dependent on motor activity, or are there purely sensory instances of CP?

There are three ways motor activity could be crucial to CP: (a) in analog form (as dictated by the motor theory of speech perception), (b) in furnishing the arbitrary names and symbolic descriptions provided by language, or merely (c) as an operant source of arbitrary or functional discriminating responses. Whether the motor system is crucially involved in CP in any of these ways is not yet known. It is logically possible that CP could be mediated purely by sensory matching (with one stimulus serving as the instance and the other as the label) but it would be difficult to demonstrate that covert language had not played a role in human experiments; possibly animal experiments on stimulus/stimulus CP could clarify this, although there too an overt operant response is likely to be necessary (cf. Premack 1976). CP effects arising purely as a result of sensory preconditioning seem unlikely, but they too remain a logical possibility. P (vi)

What is the current status of the motor theory?

The motor theory (2.3.1; 3.3.2) is currently moot. Its ontogenetic version seems to be contradicted by developmental data (2.3.4) and its phylogenetic version seems weakened by comparative data (2.5.4). However, it is still not eliminated as a possible special factor in the case of speech-sound CP. P (vii)

What is the role, if any, of analog representations, sensory or motor, in CP?

Moot. The role of the motor system has not been sorted out, nor has the role of analogs (2.2.1). A further complication comes from the fact that analog matching is one of the measures of relative discrimination itself. CP can probably occur without motor analogs, but motor analogs probably strengthen it, and hence may indeed have been capitalized upon in some instances by evolution as the later motor theories have claimed (2.5.4). P (viii)

Is CP unique to audition, or does it occur in other modalities as well?

CP clearly occurs in vision too (color CP; 2.5.7); audition may be a special case because of its production-analog character and its (consequent) preferred evolutionary status as a signaling system (2.5.4; Steklis & Harnad 1976). P (ix)

Is auditory CP unique to speech?

No, there are now numerous instances of nonspeech auditory CP (2.3.3). P (x)

Is CP unique to human beings, or do other species exhibit it as well?

Nonhuman species exhibit CP-like effects too (2.5.2 - 2.5.5). The role of CP in comparative cognition and comparative communication is an important topic for future research. P (xi)

Is CP a learned or an innate phenomenon or both?

This is one of the most critical unanswered questions about CP. Experience can demonstrably modulate CP boundaries (2.3.2, 2.3.3, 2.5.1 and 2.5.3) but it is not yet known whether it can create them de novo or alter them radically or permanently. P (xii)

Are there short-term and long-term CP effects?

There are short-term, habituation-like effects and task-dependent stimulus-range effects for sure (2.2.3, 2.3.2, 2.3.3, 2.5.5). Long-term effects have not yet been carefully tested. Learned CP boundaries would be the most convincing demonstration of a long-term effect, but long-term or permanent movement of a pre-existing boundary would also be an important finding. P (xiii)

Does CP imply total within-category indiscriminability or merely enhanced within-category similarity and enhanced between-category distinctiveness?

The overstatement of within-category reductions in discriminability (with claims that within-category differences are indiscriminable) has resulted in much misunderstanding of CP (2.4.4; 2.5.3). There never has been total within-category indiscriminability, nor would that make psychophysical sense: It would require a category one jnd wide! There has also not yet been a clear demonstration of the time-course ofthe acquisition of CP in which discriminability within and between categories is compared before and after CP training (2.8.2). For innate CP it would be hard to establish a baseline to assess what was enhanced and what was diminished. P (viv)

Are CP boundaries fixed or plastic?

Short-term plasticities have been demonstrated (2.2.2). Long-term plasticity remains to be investigated, both early in development and at maturity. P (xv)

Is CP purely a continuity/discontinuity phenomenon?

Psychophysical studies of CP are of course based on perceptual discontinuities. So are models that posit thresholds (2.2.2) or all-or-none feature detectors (2.4.1). However, as long as a similarity metric can be inferred from the data (e.g., by psychophysical scaling techniques, Tversky 1977, or even with event-related potentials -- see section 2.6), the requisite boundary effects on discrimination can in principle be demonstrated without any real physical continuum or continuity being involved. P (xvi)

Is CP an atemporal effect or does it have a temporal counterpart?

First of all, auditory CP usually involves a temporal (horizontal) component as well as a synchronous (vertical) one (see, for example, Pastore 1987 and Ehret 1987). In addition, the kind of rechunking experiment Miller (1956) described (e.g., the recoding of 0/1 binary digit sequences into bigger chunks for better recall by using their overlearned decimal names) involves the temporal domain. Moreover, direct experiments on temporal discrimination remain to be tried. In principle, anything that can be coded into a Millerian chunk can be a category and can hence give rise to CP effects. P (xvii)

Is CP just a concrete perceptual effect or does it occur with abstract categories as well?

Abstract CP remains to be investigated, but it is certainly possible in principle (2.7.2). Moreover, in the model to be described in this chapter it will be hypothesized that the feature extraction required to form categories in the first place necessarily involves abstraction. P (xviii)

What is the relation of CP research to natural category research?

CP research is unified by a discriminability paradigm which permits within- and between-category similarities to be tested; CP research has also until now been concerned largely with concrete perceptual categories (see Pastore 1987). Natural category research (although it began with color categories; Berlin & Kay 1969; Rosch & Lloyd 1978) has largely become preoccupied with (a) how long it takes subjects to judge an instance to be a member of a category, (b) how typical a member they judge it to be and (c) what features or rules they report using in order to perform the categorization (2.7.2 - 2.7.2). Typically the categories are already well-learned and the membership judgments are reliable; the relation of typicality judgments and categorization latency to within- and between-category discriminability is not known, although it should certainly be investigated. Models arising from the two different approaches differ. It is time to unify these two areas of categorization research, along with the older concept formation research (Bruner, Goodnow & Austin 1956). P (xix)

What is the relation of CP research to work on feature detectors in neuroscience?

As yet, minimal. Some CP modelers are thinking in terms of feature detectors (2.4.2, 2.4.3), especially in the case of categories that look innate (2.3.4, 2.5.1, 2.5.2). However, the difficult cognitive problem of category acquisition does not yet have a neural basis to draw on, although some of this may eventually emerge from human ERP (event-related potential) studies (2.6.9) and from more basic work on sensory psychophysics (2.2.2, 2.2.3) and higher sensory and cognitive functions. P (xx)

What is the relation of CP research to Gibsonian work on direct perception?

The Gibsons were among the first to be interested in perceptual learning (E. Gibson 1969). And, of course, the ecological optics approach (J. Gibson 1979) does emphasize the detection of invariants (see also Neisser 1987). However, the notion of direct, unmediated pick-up has not so far proven useful in modeling invariance extraction in category formation, particularly in the important cases in which learning is involved. CP seems more amenable to a constructive (Rock 1983), computational (Ullman 1980), and connectionist approach (McClelland et al 1986). P (xxi)

What is the relation of CP research to research in pattern recognition and artificial intelligence?

Until recently, statistical pattern recognition research has not been successful in developing models with realistically general categorization capabilities (Minsky & Papert 1969). The subject is far from closed, however, with the new work on formal learnability (Osherson, Strob & Weinstein 1986) and on connectionism (Ballard 1986; McClelland et al. 1986; Rumelhart et al. 1986) re-opening the topic of general induction: How do systems learn? Artificial intelligence (AI) has tended to focus on specialized problems requiring considerable built-in symbolic knowledge (Schank 1986) and, as in the primate language studies (Premack 1976), the symbol manipulation has been considerably overinterpreted (see Harnad 1989). The higher cognitive problems in categorization research call for an inductive approach which AI will only be able to provide if it attempts to construct more general, all-purpose category learning models, perhaps even hybrid symbolic/nonsymbolic (e.g. connectionistic) models. The CP model to be proposed in this chapter, for example, is potentially computer-testable, and its further development could be guided by both perceptual-learning and simulation data. P (xxii)

What is the relation of CP research to contemporary philosophy of cognitive science?

The CP view is fundamentally at odds with the pan-propositional (or symbol-crunching) view, according to which most of the cognitive work is done by mental sentences (much as in current AI; Pylyshyn 1980, 1984, Fodor 1975, 1980). This sentential view seems to be ungrounded: the meanings of the atomic symbols of its sentences cannot simply be derived from still more sentences without infinite regress. According to the model proposed here,

the meanings of elementary symbols must be grounded in perceptual categories.

That is, symbol tokens, which are manipulated only in virtue of their form (i.e. syntactically) rather than their meaning must be reducible to nonsymbolic, shape -preserving representations. Semantics can only arise when the interpretations of elementary symbols are fixed by these nonsymbolic, iconic representations. The CP view is bottom-up, but psychophysical rather than neural; and it emphasizes the crucial grounding function of nonsymbolic (iconic and categorical) representations.\**

[footnote start] Without homuncularity, i.e., without the need for something/someone else to look at or interpret the icons. Homuncularity has been used as one of the arguments against copy theories of perception. [footnote end]

3.5 A three-level representational system: An iconic and categorical acquaintance system and a symbolic description system

18.5.1 Iconic and categorical representations. The first version of this model was proposed in order to account for the kinds of differences that appear to underlie hemispheric lateralization (Harnad et al., 1977; Harnad 1982b). However, the model is not specifically committed to left/right differences (which may turn out to have been exaggerated in the flurry of research stimulated by the dramatic split-brain findings). The basic hypothesis is that whenever a categorizer encounters a sensory input, not one, but two kinds of representation of the input begin to be established (if they do not exist already) or become activated (if they already exist):

The first kind of representation is iconic , being an analog of the sensory input (more specifically, of the proximal projection of the distal stimulus on the device's transducer surfaces). This iconic representation (IR) is unbounded, in that it is not governed by an all-or-none category boundary. (The sense of this will become clearer soon.)

It is perhaps misleading to describe the IR as a representation, since, by its nature, it will in fact be many (mostly very similar) analog representations; the differences among these will arise from the instance-to-instance variation of the input class in question. Here is a simplified example: Suppose the input to the categorizer was a species of mushroom. The instances would vary in all the ways such a mushroom could vary; not only in size, form and color, but also in position, surroundings and time of day. It is even misleading to speak of the instances that activate an IR as being any single class of inputs at all (except at the meta-level), for to the iconic system they would blend continuously into one another, with nothing setting them apart except perhaps whatever natural boundaries there may be amongst the variations from instance to instance: The mushroom may never grow upside-down or appear suspended in mid-air, or there may exist no intermediate forms blending continuously from it into a toadstool.

So analog representations are unbounded in the sense that nothing reliably links them to a shared category except whatever natural similarities and differences there may among them.\**

[footnote start] The fortuitous gaps, niches and correlations in the variation -- otherwise potentially continuous and omnidirectional -- among the instances on which they are based provide an important constraint on iconic representations (as well as on the categorical representations to be introduced later). However, although the fact that the world is thus conveniently partitioned into many disjoint natural kinds is a significant simplifying factor for some categorization problems, it by no means represents a general solution to the problem of category acquisition. For example, the problem of perceiving object constancy under spatial transformations still requires the selective detection of invariants. Any domain in which instances vary continuously is a potential problem. So is any domain in which the variation, though discrete, is so complex, subtle or confusable -- i.e., underdetermined -- as to necessitate selective search and filtering. And finally there is the domain of abstract objects whose instances vary along conceptual rather than physical dimensions. (E.g, along what dimensions of variation are prime numbers or well-formed sentences or existentialist writing similar or different?) [footnote end]

These similarities and differences would unite somewhat the IRs that were activated by inputs that had some overall configural similarity (in vision this would perhaps be overall topographic form) that they did not happen to share with any other input.\**

[footnote start] This kind of holistic resemblance, helped out by some convenient natural gaps in variation, is the only sort of property that seems amenable to being picked up in a direct, passive, Gibsonian fashion (Gibson 1979) by a perceptual system that was neither pretuned innately toward specific trigger features nor actively processing information (cf. Neisser 1987). External invariants must certainly underlie all successful categorizations, but how much internal processing is required to detect, select and use them is an empirical matter depending on the degree of underdetermination of the particular category and context in question. Invariants may come at a higher processing price than a Gibsonian mechanism can afford. [footnote end]

But apart from such ecological boundaries, iconic representations blend continuously into one another, sharing the same analog representational substrate to the degree that they share overall physical similarities of configuration or shape. Note that, apart from possibly subserving some none-too-reliable natural categorization, these IRs would not be very useful for categorization and identification (i.e., for absolute discrimination).\**

[footnote start] For similar reasons, nondirected or ad lib similarity (sort these any way you like) of the kind studied and modeled by Tversky (1977) seems unlikely to explain how we categorize. Categorization is an imposed rather than an ad lib task. Hence the relevant dimensions of similarity must be found and selected by active processing guided by feedback from the consequences of mis categorization (sort these until you get it right). In nontrivial (i.e., confusable, underdetermined) categorization problems the solution is not obvious in the precategorical (ad lib) similarity structure. [footnote end]

On the other hand, IRs would be ideal for relative discrimination, in that they faithfully preserve the iconic character of the input for such purposes as same-different judgments, stimulus-matching and copying. These are all fundamentally graded, noncategorical tasks, in which categorization would probably introduce biases that would distort the analog, holistic character of the raw, unfiltered inputs.

At the same time that IRs are being strengthened by repeated exposure to a class of inputs it is hypothesized that another kind of representation is forming: The bounded categorical representations (CRs) have (a) all-or-none category boundaries; CRs are (b) highly context-sensitive (with context here being used in the specialized sense introduced earlier: the sampled set of instances of relevant and confusable alternative categories); and CRs are (c) feedback- or consequence-dependent. In the case of the mushroom discrimination problem, the context of alternatives would be all the kinds of mushrooms with which the one in question could be confused. Now if toadstools were the only existing alternative, and they were all separated fortuitously by a reliable natural shape-gap of some kind, then perhaps their CR would be redundant (except for collecting this naturally disjoint class of instances under a collective label). But if there were any possibility of confusion -- as there would be in many natural, nontrivial (i.e., underdetermined) categorization problems -- then the CR would have to take a form that was radically different from the IR. Note that IRs are the result of an analog transformation, preserving -- except perhaps for some acquired and innate smoothing and some of the unavoidable information loss mentioned earlier in connection with analog transformations -- the spatiotemporal structure (i.e., the physical shape) of the input or proximal stimulus. The CR cannot afford to do this, at least not indiscriminately; in fact, the CR must eliminate most of the raw configural structure, retaining only what is invariant in all of the uninformative and irrelevant instance-to-instance variation of the mushroom in question and invariantly absent from instances of other categories of mushroom within the same context of alternatives (e.g., toadstools). In other words, the CR must act in part as a kind of A/D (analog/digital) filter that reliably sorts the mushroom instances into their appropriate, bounded categories using distinctive, confusion-resolving features.\**

[footnote start] It seems to be a point of logic rather than one of theoretical preference that if a categorizer is able to perform error-free categorization then that performance must be based on detecting and using some set of features that is sufficient to serve as a basis for the successful categorization (though not necessarily necessary or exhaustive, for, especially with underdetermination, there might be other features that would suffice too). The putative alternatives to the classical necessary/sufficient-features approach to categorization -- originating with Rosch (Rosch & Lloyd 1978) and attributed to Wittgenstein (1953) -- seem to be based on confusions among the following additional (and independent) factors: (i) Some categorization is not all-or-none; there may be no X's, just things that are X to greater or lesser degrees (e.g., the category big). (ii) Some categorization performance may not be reliable; subjects may sometimes miscategorize, or there may be some instances whose membership is uncertain or graded or probabilistic (e.g., the category guilty). (iii) The subject may not be aware of the features he is using; the ones he verbalizes may indeed be neither necessary nor sufficient, but then they're not the ones he's using. (iv) There is an element of arbitrariness in what one does and does not choose to call a feature (as opposed to a metafeature); there is no logical or practical reason why features cannot be disjunctive, negative, conditional, relational, polyadic or probabilistic -- or even derivable only by complex computational, constructive, algorithmic, propositional or model-driven processes -- as long as they are grounded in reliable, detectable invariant properties of the instances being categorized and they are sufficient to subserve successful categorization.

Hence, at least insofar as our reliable, overlearned, all-or-none, bounded categories are concerned -- and these are the categories (e.g., bird and pet) that tend to be used in the experiments stimulated by Rosch's work -- both the existence and the use of (singly) sufficient (and disjunctively necessary) sets of features seems inescapable. The origin of the putative alternatives to this -- non-necessary/sufficient prototypes and family resemblances -- seems to be attributable to a focus on typicality judgments and reaction times rather than categorization per se, together with a reliance on the subject's (and perhaps the experimenter's) introspections as to the basis for the categorization. The real basis for categorization can only be found by inference, as tested by models that attempt to generate reliable categorization performance when confronted with the same instances that subjects can categorize successfully. [footnote end]

Using the terminology just introduced here, however, this filtering function would be more perspicuously described as I/C (iconic-to-categorical) rather than A/D, for even invariant features are (minimally, selectively and partially) shape-preserving, and hence not yet arbitrary formal symbols: not yet fully digitized.

18.5.2 Active versus passive filtering. It is not clear whether the categorical representation should be equated with the I/C filter itself, or with a filtered instance or even a filtered IR. The model is currently not specific enough to be committed to any of these three interpretations. It is clear that some interaction among input instances, stored IRs and stored filters will be involved in categorization, but the question of whether or not to posit more content to the CRs than the filter itself seems somewhat premature at this point and in any case would not substantively alter this account.

More important is the question of the nature of the I/C filtering function of the CRs. It seems unlikely that these will be passive filters, simply selectively detecting some stimulus feature (such as straightness) and then sorting according to its presence or absence (e.g., in the context of rectilinear versus curvilinear planar forms). Passive feature-detection no doubt occurs, but active, constructive filtering may be even more important. Some computation would be called for in arriving at conjunctive and disjunctive invariants (e.g., small and rectilineal, small or rectilineal) by induction, although once found to be reliable, they could of course be detected in parallel. A task that was more like prime-number detection (which I of course do not suggest we do perceptually or on-line) would require active computational processes, however. Perhaps the identification of the closed planar rectilineal forms (particularly those with more than seven sides) represents a categorization problem of the latter kind. The analysis-by-synthesis variant of the motor theory (Stevens & Halle 1967) also suggests that there is active filtering in phoneme perception; temporal processes seem by their nature to require active, real-time filtering and integration. But even classical instances of unconscious inference underlying perceptual constancies seem to involve constructive rather than passive filtering (Rock 1983). Finally, abstract categorization problems (e.g., deciding whether or not a given function is differentiable or whether or not a given letter string is a word) seem especially to call for active processing in order to be solved correctly. Note also that the active process could itself be (1) analog or (2) A/D (as in mental rotation, Shepard & Cooper 1982, or in categorical perception) or it could even be (3) symbolic (as in mental counting or inference, Fodor 1975) or (4) hybrid (i.e., made up of all all three).\**

[footnote start] Symbolic and hybrid processes -- and theory-driven categorization in general (if the theory is explicitly represented) -- require a third kind of representation to be introduced in section 3.5.3. [footnote end]

The issue of whether CRs involve passive or active filtering (or even whether the filtering is conscious or unconscious, Holender 1986) does not affect the general hypothesis that bounded representations are formed by the detection and encoding of the features that are sufficient to sort negative and positive instances correctly. Not even questions of optimality (necessity, sufficiency, exhaustiveness, parsimony, efficiency, speed, robustness, reliability, etc.) are critical to the basic constraint that CRs must be very different from IRs. Whereas IRs preserve analog structure relatively indiscriminately, CRs selectively reduce input structure to those invariant features that are sufficient to subserve successful categorization (in a given context). It remains to add that the categorization must be expressed or acted upon in some form, and that although any differential response would do, the vast repertoire of arbitrary\**

[footnote start] The question of analog versus arbitrary responses, though pertinent to question (iii) raised in section 3.4, will not be discussed here. See Harnad 1982b. [footnote end]

labels that (later) constitutes the lexicon of a language is the natural source for the differential responses associated with every categorization. Hence CRs are not only filtered (or filters) but they are labeled: Associated with the CR of the positive instances of each category (in a given context) is the category name.

At this point we can again turn to the CP phenomenon to note that, just as IRs can account for discrimination performance, CRs can account for identification. And, most important, the CP boundary effect, that is, the enhancement of between-category differences and within-category similarities, can be seen as a natural interaction between the two kinds of representation, with the filtered invariants of the CRs biasing the analog structure of the IRs (see Figure 1). Another important point is that, up to now, both kinds of representation and their effects arise from direct acquaintance with the instances, hence both are perceptual representations (one more concrete than the other). The labels of the CRs, however, have been hypothesized to correspond to the linguistic lexicon, and our vocabulary certainly does not arise purely by perceptual acquaintance. Moreover, there is so far something uncomfortably extensional and referential about this system of labels, that is, they seem to stand for or refer to a set of instances, and we know that there are problems with referential theories of linguistic meaning (Frege 1952; Putnam 1975).

18.5.3 Symbolic representations. This is the juncture at which the description system must be introduced; but first a simplifying assumption will be made (one that will may give rise to some objections, particularly from speech act theorists, e.g., Searle 1969): For the purposes of this theory of categorization, no generality is lost if the only kind of linguistic act taken into consideration is the declarative sentence. Even more specifically, it will be assumed that all declarative sentences can be reformulated and meaningfully treated as propositions about category membership. That is, what is predicated about the subject of a proposition is that it is a member of some category or other. For example, the foregoing sentence can be (tediously) reformulated as making the proposition that all (sentences that make) predications are (members of the category of) sentences that assign category membership.\**

[footnote start] Among the objections to tis approach there will no doubt be the complaint that it falls prey to the well-known weaknesses and limitations of first-orderism -- the defunct idea that cognition is just first order predicate logic. I hasten to point out to potential critics adopting this line of attack that I am not speaking of an uninterpretated formal symbol system but a hybrid dedicated one, with symbol meanings grounded in perceptuomotor categories. [footnote end]

With this simplifying assumption one can state the hypothesis that the description system assigns category membership by dictum (so to speak). Instead of constructing an invariance filter on the basis of direct experience with instances, it operates on existing labels, and constructs categories by manipulating these labels, in particular, assigning membership on the basis of stipulated rules rather than perceptual invariants derived from direct experience.

Consider the same categorization problem solved two different ways.\**

[footnote start] In both cases tabula rasa assumptions cannot and will not be made; in other words, some prior categories will already be assumed to exist, whether innately or by prior recursion based on the same learning principles being described here. [footnote end]

Let the problem be the very first one we considered, that of telling apart trees and animals (in a smoothed context without anomalies and with limbs already categorized). The solution by the acquaintance system (de re) requires sampling instances of trees and non-trees and (presumably) converging on the L<=1 invariant by induction. The result is a CR which consists of an L<=1-filtered, labeled representation plus an appropriately biased population of IRs. The solution by description (de dicto) is to use the existing label repertoire (which includes limbs, numbers, and whatever the tree/animal instances are collectively called in their superordinate context) and stating the rule: It's a tree if it has one limb or less, an animal otherwise.\**

[footnote start] A more natural example would be first learning by acquaintance to label horse and stripe and then learning by description rather than acquaintance that zebra = horse & stripes. [footnote end]

The principle is simple: Descriptions spare us the need for laborious learning by direct aquaintance; however, they depend on the prior existence of a repertoire of labeled categories on which the descriptions can draw. Hence symbolic representations (SRs), which are encoded as mental sentences, define new categories, but they must be grounded in old ones; the descriptive system as a whole must accordingly be grounded in the acquaintance system.\**

[footnote start] The advantage of learning categories by description is that it allows instances to be sorted correctly without any prior acquaintance merely by combining the symbols for prior categories in a proposition; but, to avoid infinite regress, categories learned by description must be grounded either in (i) already grounded categories or in (ii) categories learned by acquaintance or known innately (i.e., learned by evolution).

Although there is no space to elaborate the point here, I conjecture that the problem of ungroundedness is responsible for (i) McCarthy & Hayes's (1969) frame problem (i.e., the enormous difficulty that pure symbol-manipulating programs have in determining what has altered and what has remained constant after any given change has taken place) as well as for (ii) the common criticism that AI's symbolic meanings are not intrinsic but derived (e.g., Searle 1980). There is also an interesting connection with (iii) Minsky's (1969) credit assignment problem and with (iv) underdetermination and approximationism in general. [footnote end]

Moreover, apart from the obvious parasitism of description on acquaintance, there is nothing to prevent redundant representations from being formed. There could, for example, be both a CR and an SR for the tree/animal categorization above. And whereas, apart from its grounding function, the acquaintance system has a certain primacy with respect to concrete categories, the descriptive system will obviously loom increasingly large with abstract ones.

The idea of abstraction needs some further elaboration in the context of this model. Note that an element of abstraction was already involved in the I/C process. The CR really is an abstraction from the raw instances: It consists of the invariant features sufficient to subserve the categorization, with everything else discarded. Hence it is not true that the CR is merely an extensional representation (as the IR is, perhaps). It does not encode the totality of the instances but only their invariant properties. Hence it is an intensional rather than an extensional representation. It encodes invariants, properties, relations, rules.\**

[footnote start] CRs only encode rules implicitly, whereas SRs encode them explicitly (cf. 2.7.3 and 2.7.4; see also Stabler 1983). [footnote end]

Now recall the recurrent caveat I have been mentioning about invariants in a given context. The fact is that the same instance (or object, if you will) can appear in many different contexts (indeed, an infinity of them), depending on the relevant alternatives, confusabilities and contrasts involved. Consequently, an instance can have (and activate) many different CRs, each associated with a different context. Each context will in general involve superordinate categorizations, with their own attendant invariants. (For example, the context of the tree/animal discrimination could have been living things, concrete objects, things I have in my back yard, drawings, etc.) Whether or not this locally hierarchical representational system is strictly hierarchical throughout is an open question (Keil 1979), but I suspect not. In any case, the fact that different features of an instance will be encoded in different CRs associated with different contexts suggests that objects and properties are represented in similar ways in this system. Apples (which are objects, presumably) will have CRs that select for redness, roundness, and so on. Red things and even redness (they have somewhat different contexts) will likewise have their own CRs (with all three kinds counting apples as instances). The process of forming bounded representations is the process of abstraction, and CRs consist of abstracted features (or their detectors; cf. 2.4.1ff).

CRs are representations of the members of the category that they discriminate. They accomplish the absolute discrimination by selectively filtering features. Hence, cognitively speaking, objects are the instances that CRs sort and properties are the means by which CRs sort them. But nothing prevents an instance from being an instance of a property, in which case its CR will again sort it by (higher-order) properties. The only difference between whether something is an object or a property in this representational system is whether it is encoded as a category (in which case it is treated as an object) or it is merely used as an invariant in the encoding of a category (in which case it is just a property). Again, in practise, as with CRs and SRs for the same thing, properties that are encoded in the CRs from some subordinate category will also tend to be members of superordinate categories themselves, although again some bottom-level object and first-level feature distinction must ground the system. Once features get named, however, it is natural that the work of making further categorical distinctions should be taken over by the symbolic system, not only because words and other formal symbols are preferable to concrete representations in dealing with abstractions, but also because of the remarkable capacity of natural language to describe anything (to as close an approximation as one can express the need for; Steklis & Harnad 1976).\**

[footnote start] I do not have a theory for the expressive power of language, but I suspect that there is an analogy between the possibility of generating all further categories from a grounded set of prior categories by recombining their symbols and the possibility of deriving all (provable) theorems from a set of axioms and derivation rules. What and how much constitutes a grounded set is an open question. [footnote end]

18.5.4 Limits of the model. It remains to state candidly what this model has not done before summarizing what it has attempted to do and returning briefly to describe its bearing on the philosophical questions that were alluded to at the outset: This model has not provided an algorithmic solution to the problem of induction; it has not given a general (or even a particular) formula that will find the invariants underlying any given (underdetermined) categorization problem. The model is only a sketch of some of the general features it might be useful for such an inductive device to have, if such a device is possible at all (see Osherson et al. 1986; McClelland et al. 1986). Cognitive nativism -- the idea that categories are not learned but inborn -- is a vast null hypothesis. For some theorists it seems plausible to accept this null hypothesis a priori (Chomsky 1980, Fodor 1985), or at least on persuasive current evidence.\**

[footnote start] There is something highly nondemonstrative about poverty-of-the-stimulus arguments (generalized to concept formation from linguistics) to the effect that input samples are too impoverished (underdetermined) for induction ever to converge (see Chomsky 1980). In the case of concept formation this argument is often coupled with an equally nondemonstrative vanishing intersections argument to the effect that there are no invariant properties underlying all the positive instances of categories (Fodor 1985). My own approach is based on the assumption that both these arguments are wrong, at least in the case of perceptual categories, and the conceptual categories grounded in them. [footnote end]

If the nativistic null hypothesis is right, then the evidence to date is entirely consistent with CP's being nothing but a (special, or possibly general) innate mechanism. This of course passes the inductive burden for learning CP categories to evolution (Harnad 1976) or perhaps, with the new preformationism, the inductive burden can be discarded altogether (Eldredge & Cracraft 1980). The present model is frankly inductivist and still envisions that the burden can be shouldered without consigning the origins of our categories to the Big Bang.

Apart from being inductivist but failing to provide any inductive algorithms, the present model also goes considerably beyond the data, not only generalizing CP considerably beyond what has as yet been investigated, but not even accounting for the existing data in the most parsimonious way: Simple, task-specific feature-detectors would have been enough; the existence of three representational systems is hardly forced on us by the evidence.

On the other hand, the model does synthesize CP findings with identification performance data in general, and it does pursue a rather suggestive developmental and representational link between identification in psychophysics and identification in pycholinguistics. It also suggests that the categorical perception literature may have more in common with other lines of research on cognition (e.g., Neisser 1987) than merely the name. The model itself also attempts to resolve some of the disputes about whether representations are imagistic or propositional (Kosslyn, Pinker, Smith & Shwartz 1979; Pylyshyn 1981). (Answer: Some of each, and more besides.) Finally, some productive contact is made with certain long-standing philosophical problems. I will close by describing the model's implications for the problems raised at the beginning of the chapter.

18.6 Philosophical Implications

18.6.1 The problem of induction: If any nontrivial induction actually does occur in cognition, then a category-representing architecture of the kind I have described could accomplish it in principle. All-purpose learning algorithms have not been proposed, but the narrowing of the scope of the problem provided by the idea of context, invariants and the context-relativity of categorization as well as the general limits and advantages suggested by approximationism and convergence perhaps recast the empirical side of the induction problem in a more tractable form.

18.6.2 The word/world problem, meaning holism and concept revision: According to this model, a potential dissociation between words and the world (i.e., an indeterminacy about how words pick out their referents, which ones they pick out, and what the link between them is) is mediated by the dual acquaintance system. The (always provisional and approximate) match between words and world is grounded in our perceptual categories, which are based on the invariants that are sufficient to subserve reliable categorization, that is, object-sorting. Word meanings are context-relative and local, on the one hand, and, because contexts are inter-related (both vertically and horizontally) and always susceptible to widening by experiential contingencies, meanings are also perpetually subject to holistic revision and updating, with new categories and their representations always subsuming old ones as special cases (i.e., as having been rougher approximations to the same thing \**

[footnote start] This amounts to a denial of radical incommensurability in concept revision, even after cataclysmic paradigm shifts (Kuhn 1970). The argument is also related to the translatability thesis (Steklis & Harnad 1976). [footnote end]

). When the word/world relation is recognized to be approximate rather than exact -- mediated by our provisionally successful categories rather than some absolute or ontic standard of veridicality, with convergence and a steady tightening of the approximation guaranteed by the cumulativity of the category formation process itself -- the word/world link looks somewhat more secure.\**

[footnote start] Among other things, it follows from this model that all of our categories have to be approximate as between the earth/twin-earth examples of Putnam (1975; and Goodman's [1954] green/grue, the alternate interpretations of Quine's [1960] gavagai and other such examples of representational indeterminacy), for distal differences not reflected in the proximal stimulation would have the same IRs, CRs and SRs in this system. Potential problems arising from this would appear to be blocked by the following two principles: (I) The cognitive identity of indiscriminables (i.e., narrow approximationism: what you can't tell apart is the same to you) and (II) Methodological epiphenomenalism considered as a research strategy in cognitive psychology (i.e., only aspire to model categorization performance capacity rather than qualitative content). (See Harnad 1989.) [footnote end]

18.6.3 The acquaintance/description problem: The differences between sensory and verbal information and what one can expect from them are reflected in the model in a rather natural way. Again, the several levels of approximation involved in forming each of the representations and the nature of the dependence of the description system on the acquaintance system seem to mirror some of the phenomenology. The problem of qualia (the irreducible qualitative nature of subjective experience) is of course hardly solved here, but it is put in the context of other irreducibles and approximations.

18.6.4 Elementary percepts and atomic symbols: Here the proposal is rather bold and specific. CP (rather than jnds) furnishes our elementary percepts, whose names serve as the atomic symbols of our propositions. Experiential data and invariance-filters ground these symbols with their initial meanings, and symbol recombinations in propositions generate the rest. Once symbols are grounded, a good deal of redundancy and cross-talk become possible between perceptual and verbal representations. Rival approaches that rely exclusively on a single symbolic description system (e.g., Pylyshyn 1984) are seen, despite the acknowledged power and scope of symbol manipulation, to be fundamentally ungrounded, with symbol meanings indeterminate apart from the theorist's interpretation. Given the power of symbolic representation to approximate any other kind of representation, however, it remains an open question (and probably in part a contingent one having to do with time and capacity constraints, efficiency and robustness considerations and other questions of optimization) just how much iconic and categorical representation is needed in order to ground the symbolic system. The burden of the argument of this chapter was to show that this amount is of necessity not zero.

18.6.5 The problem of universals: What is primary in cognition is categorization. Categorization always involves the sorting of instances (in a context) according to invariant features. The level of encoding is different for an instance and for its (intracontextual) features. This would be the representational basis of the object/feature distinction in cognition. Then, of course, features can themselves be treated as objects, with their own higher-order features, and so on. For a cognizer, an object is merely a member of a set of instances that is categorized in a certain way on the basis of certain features. Needless to say, this model of how objects and features are represented says nothing whatever about what really exists behind instances, that is to say, behind appearances. It only addresses the concept of universals, suggesting one mechanism that appears as if it would be able to handle appearances adaptively -- if, that is, things are indeed as they appear.

18.6.6 The other-minds problem revisited: Appearance is of course that persisting philosophical problem that no amount of cognitive science will resolve. Everything that has been conjectured here about the nature of the representations that generate both CP effects and general categorization performance would be equally true of (a) a device that behaved exactly as if it saw, felt, believed and meant things (but did no such thing, being an insentient automaton, devoid of all qualitative experience) and of (b) a device (like ourselves) that really did see, feel, believe and mean things -- in other words, one that really had a mind, rather than merely appearing to have one. This categorical distinction, like the problem of whether the members of any other category we have are in reality the way they appear, seems to exceed the resolving capacity of the approximationist variety of representational device that we ourselves are (according to the present model), no matter how tight we make the approximation, and irrespective of whether we are wearing our cognitive scientists' hats, our philosophical apparel or our ordinary folk costumes. The other-minds problem is the focus of Harnad 1972 and 1989.

18.7 Summary

Categorization is a very basic cognitive activity. It is involved in any task that calls for differential responding, from operant discrimination to pattern recognition to naming and describing objects and states-of-affairs. Explanations of categorization range from nativist theories denying that any nontrivial categories are acquired by learning to inductivist theories claiming that most categories are learned.

Categorical perception (CP) is the name given to a suggestive perceptual phenomenon that may serve as a useful model for categorization in general: For certain perceptual categories, within-category differences look much smaller than between-category differences even when they are of the same size physically. For example, in color perception, differences between reds and differences between yellows look much smaller than equal-sized differences that cross the red/yellow boundary; the same is true of the phoneme categories /ba/ and /da/. Indeed, the effect of the category boundary is not merely quantitative, but qualitative.

There have been two theories to explain CP effects. The Whorf Hypothesis explains color boundary effects by proposing that language somehow determines our view of reality. The motor theory of speech perception explains phoneme boundary effects by attributing them to the patterns of articulation required for pronunciation. Both theories seem to raise more questions than they answer, for example: (i) How general and pervasive are CP effects? Do they occur in other modalities besides speech-sounds and color? (ii) Are CP effects inborn or can they be generated by learning (and if so, how)? (iii) How are categories internally represented? How does this representation generate successful categorization and the CP boundary effect?

Some of the answers to these questions will have to come from ongoing research, but the existing data do suggest a provisional model for category formation and category representation. According to this model, CP provides our basic or elementary categories. In acquiring a category we learn to label or identify positive and negative instances from a sample of confusable alternatives. Two kinds of internal representation are built up in this learning by acquaintance:

(1) an iconic representation that subserves our similarity judgments and (2) an analog/digital feature-filter that picks out the invariant information allowing us to categorize the instances correctly. This second, categorical representation is associated with the category name. Category names then serve as the atomic symbols for a third representational system, the (3) symbolic representations that underlie language and that make it possible for us to learn by description .

This model provides no particular or general solution to the problem of inductive learning, only a conceptual framework; but it does have some substantive implications, for example, (a) the cognitive identity of (current) indiscriminables: Categories and their representations can only be provisional and approximate, relative to the alternatives encountered to date, rather than exact. There is also (b) no such thing as an absolute feature, only those features that are invariant within a particular context of confusable alternatives. Contrary to prevailing prototype views, however, (c) such provisionally invariant features must underlie successful categorization, and must be sufficient (at least in the satisficing sense) to subserve reliable performance with all-or-none, bounded categories, as in CP. Finally, the model brings out some basic limitations of the symbol-manipulative approach to modeling cognition, showing how (d) symbol meanings must be functionally anchored in nonsymbolic, shape-preserving representations -- iconic and categorical ones. Otherwise, all symbol interpretations are ungrounded and indeterminate. This amounts to a principled call for a psychophysical (rather than a neural) bottom-up approach to cognition.

REFERENCES

Abelson, R.P. (1980) Searle's argument is just a set of Chinese symbols. Behavioral and Brain Sciences 3: 424-425. Apostel, L., Berger, G., Briggs, A. & Michaud, G. (1972) Interdisciplinarity: Problems of teaching and research in universities. Paris: OECD.

Ballard, D. H. (1986) Cortical connections and parallel processing: Structure and function. Behavioral and Brain Sciences 9: 67 - 119.

Berlin, B. & Kay, P. (1969) Basic color terms: Their universality and evolution. Berkeley: University of California Press

Bialystok, E. & Olson, D. R. (1987) Spatial Categories: The Perception and Conceptualization of Spatial Relations. In S. Harnad (Ed.) Categorical perception: The groundwork of cognition New York: Cambridge University Press

Block, N. (1980) What intuitions about homunculi don't show. Behavioral and Brain Sciences 3: 425-426.

Bloom, A. H. (1981) The linguistic shaping of thought: A study of the impact of language on thinking in China and the West.

Hillsdale NJ: Erlbaum Associates

Bornstein, M. H. (1984) Infant into adult: Unity to diversity in visual categorization. In J. Mehler & R. Fox (Eds.) Neonate cognition: Beyond the blooming, buzzing confusion. Hillsdale NJ: Erlbaum

Bornstein, M. H. (1987) Perceptual Categories in Vision and Audition. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Boynton, R. M. (1979) Human color vision. New York: Holt, Rinehart, Winston

Broadbent, D. E. (1975) The magic number seven after fifteen years. In: Studies in long term memory. A. Kennedy & A. Wilkes (eds.), London: Wiley

Bruner, J. S., Goodnow, J. J. & Austin, G.A. (1956) A study of thinking. New York: Wiley

Buser, P. A. & Rougeul-Buser, A. (1978) (eds.) Cerebral correlates of conscious experience. Amsterdam: North Holland, 1978.

Callaway, E., Tueting, P. & Koslow H. (Eds.) (1978) Event-related potentials in man. New York: Academic Press

Chaitin, G. (1975) Randomness and mathematical proof. Scientific American 232: 47 - 52.

Chomsky, N. (1980) Rules and representations. Behavioral and Brain Sciences 3: 1-61.

Chomsky, N. & Halle, M. (1968) The sound pattern of English. New York: Harper & Row

Cooper, W. E. (1979) Speech perception and production. Norwood NJ: Ablex

Davis, M. (1958) Computability and unsolvability. Manchester: McGraw-Hill.

Davis, M. (1965) The undecidable. New York: Raven.

Dennett, D. C. (1978) Why not the whole iguana? Behavioral and Brain Sciences 1: 103-104.

Dennett, D.C. (1982) The myth of the computer: An exchange. N.Y. Review Books XXIX (11): 56.

De Valois, R. L. & De Valois, K. K. (1975) Neural coding of color. In E. C. Carterette & M. P. Friedman (Eds.) Handbook of Perception (Volume 5) New York: Academic Press

Diehl, R. L. (1981) Feature detectors for speech: A critical reappraisal. Psychological Bulletin 89: 1-18.

Diehl, R. L. & Kluender, K. R. (1987) On the Categorization of Speech Sounds. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Donchin, E. (forthcoming) Proceedings of the 2nd Carmel Conference on Philosophical Aspects of Event-Related Potentials.

Dretske, F. I. (1983) Precis of Knowledge and the Flow of Information. Behavioral and Brain Sciences 6: 55 - 90.

Durlach, N. I. & Braida, L. D. (1969) Intensity perception. I: Preliminary theory of intensity resolution. Journal of the Acoustical Society of America 46: 372-383.

Eccles, J. C. (1978) A critical appraisal of brain-mind theories. In: Buser & Rougeul-Buser (1978, 347 - 355).

Edelson, T. (1982) Simulating understanding: Making the example fit the question. Behavioral and Brain Sciences 5: 338-339.

Ehret, G. (1987) Categorical Perception of Speech Signals: Facts and Hypotheses from Animal Studies. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Ehret, G. & Merzenich, M. M. (1985) Auditory midbrain responses parallel spectral integration phenomena. Science 227: 1245-1247.

Eimas, P. D., Miller, J. L. & Jusczyk, P. W. (1987) On Infant Speech Perception and the Acquisition of Language. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Eimas, P. D. Siqueland, E. R., Jusczyk, P. & Vigorito, J. (1971) Speech perception in infants. Science 171: 303-306.

Eldredge, N. & Cracraft, J. (1980) Phylogenetic patterns and the evolutionary process. New York: Columbia University Press

Fodor, J. A. (1975) The language of thought New York: Thomas Y. Crowell

Fodor, J. A. (1980) Methodological solipsism considered as a research strategy in cognitive psychology. Behavioral and Brain Sciences 3: 63 - 109.

Fodor, J. A. (1981) RePresentations. Cambridge MA: MIT/Bradford.

Fodor, J. A. (1985) Pr\*'ecis of The Modularity of Mind. Behavioral and Brain Sciences 8: 1 - 42.

Fraenkel, A. A., Bar-Hillel, Y. & Levy, A. (1973) Foundations of set theory. New York: Elsevier.

Frege, G. (1952) Translations from the philosophical writings of Gottlob Frege. (P. Geach & M. Black, eds.) Oxford: Oxford University Press.

Garner, W. R. (1974) The processing of information and structure. Hillsdale NJ: Erlbaum Associates

Gibson, E. J. (1969) Principles of perceptual learning and development. Engelwood Cliffs NJ: Prentice Hall

Gibson, J. J. (1979) An ecological approach to visual perception. Boston: Houghton Mifflin

Goodman, N. (1954) Fact, fiction and forecast. University of London: Athlone Press

Goodman, N. (1968) Languages of art. New York: Bobbs-Merrill

Griffin, D. R. (1978) Prospects for cognitive ethology. Behavioral and Brain Sciences 1: 527 - 538.

Gru\*:nbaum, A. (1986) Pre\*'cis of The foundations of psychoanalysis: A philosophical critique. Behavioral and Brain Sciences 9: 217-284.

Gyr, J., Willey, R., & Henry, A. (1979) Motor-sensory feedback and geometry of visual space: a replication. Behavioral and Brain Sciences 2:59-94.

Harnad, S. (1976) Induction, evolution and accountability. Annals of the N.Y. Academy of Sciences 280: 58-60.

Harnad, S. (1982a) Neoconstructivism: A unifying theme for the cognitive sciences. In T. Simon & R. Scholes, R. (Eds.) Language, mind and brain. Hillsdale, N.J.: Lawrence Erlbaum Associates

Harnad, S. (1982b) Metaphor and mental duality. In T. Simon & R. Scholes, R. (Eds.) Language, mind and brain. Hillsdale, N.J.: Lawrence Erlbaum Associates

Harnad, S. (1982c) Consciousness: An afterthought. Cognition and Brain Theory 5: 29 - 47.

Harnad, S. (1984) What are the scope and limits of radical behaviorist theory? Behavioral and Brain Sciences 7: 720 - 721.

Harnad, S. (1987a) Categorical perception: A critical overview. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Harnad, S. (1987b) Category induction and representation. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Harnad, S. (in preparation, a) Uncertainty and the growth of knowledge. [Review of Dretske (1983)]

Harnad, S. (in preparation, b) Professor MacKay's conation conundrum: A nonexistent theorem.

Harnad, S. (in preparation, c) Uncomplemented categories.

Harnad, S. (in preparation, d) Against hermeneutics.

Harnad, S. (in preparation, e) Minds, machines and Searle.

Harnad, S., Doty, R. W., Goldstein, L., Jaynes, J. & Krauthamer, G. (eds.) (1977) Lateralization in the nervous system. New York: Academic Press

Harnad, S. R., Steklis, H.D. & Lancaster, J. (eds.) (1976)

Origins and Evolution of Language and Speech.

Annals of the New York Academy of Sciences 280.

Harvey, R. J. (1985) On the nature of programs, simulations and organisms. Behavioral and Brain Sciences 8: 741-2.

Haugeland, J. (1978) The nature and plausibility of cognitivism. Behavioral and Brain Sciences 1: 215-260.

Haugeland, J. (1985) Artificial intelligence: The very idea. Cambridge MA: MIT/Bradford.

Helson, H. (1964)

Adaptation-level theory: An experimental and systematic approach to behavior.

New York: Harper and Row

Hexter, J. H. (1979) Reappraisals in History. Chicago: University of Chicago Press.

Heyting, A. (1971) Intuitionism: An introduction. New Jersey: Humanities.

Holender, D. (1986) Semantic activation without conscious identification. Behavioral and Brain Sciences 9: 1 - 66.

Howell P. & Rosen, S. (1984) Natural auditory sensitivities as universal determiners of phonemic contrasts. Linguistics 21: 205-235.

Hoyle, G. (1984) The scope of neuroethology. Behavioral and Brain Sciences 7: 367-412.

Hubel, D. H. & Wiesel, T. N. (1965) Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat. Journal of Neurophysiology 28: 229 -289.

Johnson-Laird, P. M. (1983) Mental models. Cambridge MA: Harvard University Press.

Jusczyk, P. W. (1984) On characterizing the development of speech perception. In J. Mehler & R. Fox (Eds.) Neonate cognition: Beyond the blooming, buzzing confusion. Hillsdale NJ: Erlbaum.

Katz, J.J. (1976) Effability: A hypothesis about the uniqueness of natural language. Annals of the New York Academy of Sciences 280: 33-41.

Keil, F. C. (1979) Semantic and conceptual development: An ontological perspective. Cambridge MA: Harvard University Press

Keil, F. C. (1986a) On the structure dependent nature of stages of cognitive development. In I. Levin (Ed.) Stage and structure: Reopening the debate. Norwood NJ: Ablex

Keil, F. C. (1986b) On the acquisition of natural kind and artifact terms. In W. Demopoulos (Ed.) Conceptual change. Norwood NJ: Ablex

Keil, F. C. & Kelley, M. H. (1987) Developmental Changes in Category Structure. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Klatt, D. H. (1980) SCRIBER and LAFS: Two approaches to speech analysis. In W. A. Lea (Ed.) Trends in speech recognition. Englewood Cliffs NJ: Prentice Hall

Kleene, S. C. (1969) Formalized recursive functionals and formalized realizability. Providence, R.: American Mathematical Society.

Kornhuber, H. H. (1978) A reconsideration of the brain-mind problem. In: Buser & Rougeul-Buser (1978, 319 - 334).

Kornhuber, H. H. (1984) Attention, readiness for action, and the stages of voluntary decision: Some electrophysiological correlates in man. Experimental Brain Research supp. 9: 420-429.

Kosslyn, S. M., Pinker, S., Smith, G. & Shwartz, S. P. (1979) On the demystification of mental imagery. Behavioral and Brain Sciences 2: 535 - 548.

Kuhl, P. K. (1986) Reflections on infants' perception and representation of speech. In J. Perkell & D. Klatt (Eds.) Invariance and variability in speech processes. Norwood NJ: Ablex.

Kuhl, P. K. (1987) The Special-Mechanisms Debate in Speech Perception: Nonhuman Species and Nonspeech Signals. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Kuhn, T. (1970) The structure of scientific revolutions. Chicago: University of Chicago Press

Lane, H. (1965) The motor theory of speech perception: A critical review. Psychological Review 72: 275 - 309.

Lawrence, D. H. (1950) Acquired distinctiveness of cues: II. Selective association in a constant stimulus situation. Journal of Experimental Psychology 40: 175 - 188.

Lewis, D. (1971) Analog and digital. Nous 5: 321 - 327.

Liberman, A. M. (1976) Discussion Paper. In S. R. Harnad, H. D. Steklis & J. Lancaster (Eds.) Origins and evolution of language and speech. Annals of the New York Academy of Sciences 280.

Liberman, A. M. (1982) On the finding that speech is special. American Psychologist 37: 148-167.

Liberman, A. M., Harris, K. S., Hoffman, H. S. & Griffith, B. C. (1957) The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology 54: 358 - 368.

Libet, B. (1978) Neuronal vs. subjective timing for a conscious sensory experience. In: Buser & Rougeul-Buser (1978, 69 - 82).

Libet, B. (1985) Unconscious cerebral initiative and the role of conscious will in voluntary action. Behavioral and Brain Sciences 8: 529-566.

Lieblich, I. & Arbib, M.A. (1982) Multiple representations of space underlying behavior. Behavioral and Brain Sciences 5: 627-659.

Liu, L. G. (1985) Reasoning counterfactually in Chinese: Are there any obstacles? Cognition 21: 239 - 270.

Lorenz, K. Z. (1981) The future of ethology. New York: Springer.

Lucas, J. R. (1961) Minds, machines and G\*odel. Philosophy 36: 112-117.

Lucas, M. M. & Hayes, P. J. (Eds.) (1982) Proceedings of the Cognitive Curricula Conference. University of Rochester: Rochester NY

MacKay, D. M. (1978) What determines my choice? In: Buser & Rougeul- Buser (1978, 335 - 346).

Macmillan, N. A. (1987) Beyond the Categorical/Continuous Distinction: A Psychophysical Approach to Processing Modes. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Macmillan, N. A., Braida, L. D. & Goldberg, R. F. (1987) Central and peripheral processes in the perception of speech and nonspeech sounds. In M. E. H. Schouten (Ed.) Psychophysics of speech perception. Martinus Nijhof.

Massaro, D. W. (1987) Categorical Partitioning: A Fuzzy-Logical Model of Categorization Behavior. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Massaro, D. W. (forthcoming) Speech perception by ear and eye: A paradigm for psychological inquiry.

McCarthy, J. & Hayes, P. (1969) Some philosophical problems from the study of artificial intelligence. In: Machine intelligence, B. Meltzer & D. Richie (eds.), Volume 4., Edinburgh: Edinburgh University Press

McClelland, J.L., Rumelhart, D. E., and the PDP Research Group (1986) Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1. Cambridge MA: MIT/Bradford.

McDermott, D. (1982) Minds, brains, programs and persons. Behavioral and Brain Sciences 5: 339-341.

Medin, D. L. & Barsalou, L.W. (1987) Categorization Processes in Category Structure. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Miller, G. A. (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63: 81 - 97.

Minsky, M. (1961) Steps towards artificial intelligence. Proceedings of the Institute of Radio Engineers 49: 8 - 30.

Minsky, M. & Papert, S. (1969) Perceptrons: An introduction to computational geometry. Cambridge MA: MIT Press

Molfese, D. L. (1987) Electrophysiological Indices of Categorical Perception. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Molfese, D. L. & Molfese, V. J. (1987) Right hemisphere responses from preschool children to temporal cues in speech and nonspeech materials: Electrophysiological correlates. Brain and Language (in press)

Nagel, T. (1974) What is it like to be a bat? Philosophical Review 83: 435 - 451.

Nagel, T. (1986) The view from nowhere. New York: Oxford University Press.

Neisser, U, (ed.) (1987)

Concepts and conceptual development: Ecolological and intellectual bases of categorization.

New York: Cambridge University Press

Olson, D. R. (1970) Language and thought: Aspects of a cognitive theory of semantics. Psychological Review 77: 257 - 273.

Olson, D. R. & Bialystok, E. (1983) Spatial cognition: The structure and development of the mental representation of spatial relations. Hillsdale NJ: Erlbaum.

Osherson, D. N., Stob, M. & Weinstein, S. (1986) Systems that learn. Cambridge MA: MIT/Bradford.

Pastore, R. E. (1987) Categorical Percption: Some Psychophysical Models. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Pastore, R. E., Szczesiul, R., Wielgus, V., Nowikas, K. & Logan, R. (1984) Catgeorical perception, category boundary effects, and continuous perception. Perception & Psychophysics 35: 583-585.

Paivio, A. (1986) Mental representation: A dual coding approach. New York: Oxford

Popper, K. R., & Eccles, J. C. (1977) The self and its brain. Heidelberg: Springer, 1977.

Premack, D. (1976) Mechanisms of intelligence: Preconditions for language. Annals of the New York Academy of Science 280: 544 - 561.

Putnam, H. (1975) Mind, language and reality. New York: Cambridge University Press.

Pylyshyn, Z. W. (1973) What the mind's eye tells the mind's brain: A critique of mental imagery. Psychological Bulletin 80: 1-24.

Pylyshyn, Z. (1978) Computational models and empirical constraints. Behavioral and Brain Sciences 1:93-127.

Pylyshyn, Z. W. (1980) Computation and cognition: Issues in the foundations of cognitive science. Behavioral and Brain Sciences 3: 111-169.

Pylyshyn, Z. W. (1981) The imagery debate: Analogue media versus tacit knowledge. Psychological Review 88: 16 - 45.

Pylyshyn, Z. W. (1984) Computation and cognition. Cambridge MA: MIT/Bradford

Quine, W. V. O. (1953) From a logical point of view. Cambridge MA: Harvard University Press

Quine, W. V. O. (1960) Word and object. Cambridge MA: MIT Press

Rabin, M. O. (1977) Complexity of computations. Communications of the Association of Computer Machinery 20:625-633.

Reddy, D. R. (1980) Machine models of perception. In R. A. Cole (Ed.) Perception and production of fluent speech. Hillsdale NJ: Freeman.

Regan, D. M. (1972) Evoked potentials in psychology, sensory physiology and clinical medicine. New York: Wiley.

Regan, D. M. (1975) Color coding of pattern responses in man investigated by evoked potential feedback and direct plot techniques. Vision Research 15: 175-183.

Regan, D. M. (1987) Evoked Potentials and Colour-Defined Categories. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Remez, R. E. (1987) Neural Models of Speech Perception: A Case History. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Remez, R. E. & Rubin, P. E. (1984) On the perception of intonation from sinusoidal sentences. Perception and Psychophysics 35: 429-440.

Repp, B. H. (1984) Categorical perception: Issues, methods and findings. In N. J. Lass(Ed.) Speech and language: Advances in basic research and practice (Vol. 10). New York: Academic Press.

Repp, B. H. & Liberman, A. H. (1987) Phonetic Boundaries are Flexible. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Rock, I. (1980) Difficulties with a direct theory of perception. Behavioral and Brain Sciences 3:398-399.

Rock, I. (1983) The logic of perception. Cambridge MA: MIT Press

Rosch, E. & Lloyd, B. B. (1978) Cognition and categorization. Hillsdale NJ: Erlbaum Associates

Rosen, S. & Howell, P. (1987) Auditory, Articulatory and Learning Explanations of Categorical Perception in Speech. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Rosenblatt, F. (1962) Principles of neurodynamics. Washington DC: Spartan.

Rougeul-Buser, A., Bouyer, J. J., & Buser, P. (1978) Transitional states of awareness and short-term fluctuations of selective attention: Neurophysiological correlates and hypotheses. In: Buser & Rougeul-Buser (1978, 215 - 232).

Rumelhart, D. E., McClelland, J.L., and the PDP Research Group (1986) Parallel distributed processing: Explorations in the microstructure of cognition, Volume 2. Cambridge MA: MIT/Bradford.

Sayre, K. M. (1986) Intentionality and information processing. Behavioral and Brain Sciences 9: 121 - 166.

Schank, R. C., Collins, G. C., & Hunter, L. E. (1986) Transcending inductive category formation in learning. Behavioral and Brain Sciences 9: XXX - XXX.

Searle, J. R. (1969) Speech acts. Cambridge: Cambridge University Press

Searle, J. R. (1980a) Minds, brains and programs. Behavioral and Brain Sciences 3: 417-424.

Searle, J. R. (1980b) Instrinsic intentionality. Behavioral and Brain Sciences 3: 450-457.

Searle, J. R. (1982a) The Chinese room revisited. Behavioral and Brain Sciences 5: 345-348.

Searle, J. R. (1982b) The myth of the computer. New York Review of Books XXIX(7): 3-7.

Searle, J. R. (1982c) The myth of the computer: An exchange. New York Review of Books XXIX(11): 56-57.

Searle, J. R. (1983) Intentionality: An essay in the philosophy of mind. Cambridge: Cambridge University Press.

Searle, J. R. (1985) Patterns, symbols and understanding. Behaviral and Brain Sciences 8: 742-743.

Selfredge, O. G. (1959) Pandemonium: A paradigm for learning. In Mechanization of theought processes. London: H. M. Stationery Office.

Selverston, A. I. (1980) Are central pattern generators understandable? Behavioral and Brain Sciences 3: 535-571.

Shankweiler, D. P., Strange, W. & Verbrugge, R. R. (1977) Speech and the problem of perceptual constancy. In R. E. Shaw & J. Bransford (Eds.) Perceiving, acting and knowing: Toward an ecological psychology. Hilsdale NJ: Erlbaum

Shannon, L. E., & Weaver, W. (1949) The mathematical theory of communication. Urbana: University of Illinois Press.

Shepard, R. N. & Cooper, L. A. (1982) Mental images and their transformations. Cambridge: MIT Press/Bradford.

Siegel, J. A. & Siegel, W. (1977) Absolute identification of notes and intervals by musicians. Perception & Psychophysics 21: 143-152.

Simon, H. A. (1957) Models of man: Social and rational. New York: Wiley

Skinner, B. F. (1984a) Methods and theories in the experimental analysis of behavior. Behavioral and Brain Sciences 7: 511-546.

Skinner, B. F. (1984b) Reply to Harnad. Behavioral and Brain Sciences 7: 721-724.

Slezak, P. (1982) G\*odel's theorem and the mind. British Journal for the Philosophy of Science 33: 41-52.

Smith, E. E. & Medin, D. L. (1981) Categories and concepts. Cambridge MA: Harvard

Snowdon, C. T. (1987) A Naturalistic View of Categorical Perception. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Snowdon, C. T., Coe, C. L. & Hodun, A. (1985) Population recognition of isolation peeps in the squirrel monkey. Animal Behaviour 33: 1145-1151.

Stabler, E. P. (1985) How are grammars represented? Behavioral and Brain Sciences 6: 391-421.

Steklis, H. D. & Harnad, S. R. (1976) From hand to mouth: Some critical stages in the evolution of language. Annals of the New York Academy of Sciences 280: 445-455.

Stevens, K. N., & Halle, M. (1967) Remarks on analysis by synthesis and distinctive features. In: Models for the perception of form, W. Wathen-Dunn (Ed.). Cambridge MA: MIT Press

Studdert-Kennedy, M., Liberman, A. M., Harris, K. S. & Cooper, F. S. (1970) Motor theory of speech perception: A reply to Lane's critical review. Psychological Review 77: 234-249.

Stulman, J. (1969) The methodology of pattern. Fields Within Fields 1:7-9.

Turing, A. M. (1964) Computing machinery and intelligence. In: Minds and machines , A. R. Anderson (ed.). Engelwood Cliffs NJ: Prentice-Hall

Turing, A. M. (1964) Computing machinery and intelligence. In: Minds and machines, A. R. Anderson (ed.), Engelwood Cliffs NJ: Prentice Hall.

Tversky, A. (1977) Features of similarity. Psychological Review 84: 327 - 352.

U.S. Committee on Science and Astronautics (1970) Interdisciplinary research: an exploration of public policy issues. Science Policy Research Division, Legislative Reference, Library of Congress, Serial T.

Ullman, S. (1980) Against direct perception. Behavioral and Brain Sciences 3: 373 - 415.

Vygotsky, L. S. (1962) Language and thought. Cambridge MA: MIT Press.

Whorf, B. L. (1964) Language, thought and reality. Cambridge MA: MIT Press

Wilson, M. (1987) Brain Mechanisms in Categorical Perception. In S. Harnad (Ed.) Categorical perception: The groundwork of Cognition. New York: Cambridge Univerity Press

Wilson, M., De Bouche, B. A. & Streitfeld, B. D. (1983) Catgegorical perception of visual stimuli. Paper presented at the Symposium on Categorical Perception, 54th Annual Meeting of the Eastern Psychological Association, Baltimore MD.

Wittgenstein, L. (1953) Philosophical investigations. New York: Macmillan

Wittgenstein, L. (1967) Remarks on the Foundations of Mathematics. Cambridge, Mass.: M.T. Press.

von Neumann, J. (1954) The computer and the brain. New Haven: Yale University Press.

Zadeh, L. A. (1965) Fuzzy sets. Information & Control 8: 338-353.