We share two ways to learn categories with many other species: (1) unsupervised learning (learning from mere exposure, no feedback) and (2) supervised (or reinforcement) learning (learning through trial and error guided by corrective feedback indicating whether we’ve done the right or wrong thing).
We are the only species that also has a third way of learning to categorize: (3) language.
Language probably first evolved from pointing, imitation, miming and other kinds of purposive gestural communication. Once gesture became language, it migrated to the much more efficient auditory/vocal modality of speech. But before that, language itself first had to evolve.
Language is much more than just naming categories (“apple”). Word-shapes are arbitrary; they do not resemble their referents. But iconic gestures do resemble their “referents” (the objects and actions that they are imitations of). But once gestures become associated with their objects in shared gestural communication based on similarity, the gestures can become simpler and more arbitrary, shedding their no-longer-necessary iconicity gradually through shared convention among the communicators. The “copy” need not be so faithful, as long as we continue to use roughly the same gesture to mime an object or action.
Gestures that have shed their iconicity are still not language (or reference), though. They only become language when the names of categories can be combined intosubject/predicate propositions that describe or define further categories. That’s what provides the third way of learning categories; the one that is unique to our species.
Propositions, unlike category-names, have truth values: True or False.
This is related to another essential feature of language, which is negation. A proposition can either affirm something or deny something: “It is true that the cat is on the mat” or “It is not true that the cat is on the mat.” P or not-P. (The relation to positive and negative feedback in supervised/reinforcement learning is obvious.)
The trouble with being able to learn categories only directly, through unsupervised and supervised learning is that it is time-consuming, risky, and not guaranteed to succeed (in time). It is also impoverished: most of the words in our vocabularies and dictionaries are category names; but other than the concrete categories that can be learned from direct sensorimotor trial-and-error-experience (“apple,” “red,” “pull”), most category names cannot be learned without language at all.
When categories are learned directly through unsupervised learning (from sensorimotor feature-correlations) or supervised learning (from correlations between sensorimotor features and doing the right or wrong thing) the learning consists of the detection and abstraction of the features that distinguish the members from the non-members of the category. To learn to do the right/wrong thing with the right/wrong kind of thing requires learning – implicitly or explicitly, i.e., unconsciously or consciously – to detect and abstract those distinguishing features.
Like nonhuman species, we can and do learn a lot of categories that way; and there are computational models for the mechanism that can accomplish the unsupervised and supervised learning. But, in general, nonhuman animals do not name the things they can categorize. Or, if you like, the names of those categories are the things they learn to do with the members and not to do with the members of other categories. Only humans bother to name their categories. Why?
What is a name? It is a symbol (whether vocal, gestural, or written) whose shape is arbitrary(i.e., it does not resemble the thing it names) and whose use is based on a shared convention among speakers: English-speakers all agree to call cats “cats” and dogs “dogs.” Names of categories are “content words”: words that have referents: nouns, adjectives, verbs, adverbs. Almost all words are content words. There exist also a much smaller number of “function words,” which are only syntactic or logical, such as the, if, and, when, who: They don’t have referents; they just have “uses” — usually defined or definable by a syntactic rule.
(Unlike nouns and adjectives, verbs do double duty, both in naming a referent (as content words: “cry”) and in marking predication, which is the essential function of propositions, distinguishing them from just compound content words: “The baby is crying” separates the content word, which has a referent — “crying” — from the predicative function of the copula: “is”. “The crying baby” is not a proposition; it is just a noun phrase, which is like a single word, and has a referent, just as “the baby” does. But the proposition “The baby is crying” does not have a referent: it has a sense – that the baby is crying – and a truth value (T or F).
It is with content words that the gestural origin of language is important: Because before a category name can become a shared, arbitrary convention of a community of users, it has to get connected to its referent. Iconic communication (mime) is based on the natural connection conveyed by similarity, like the connection between an object and its photo.
(In contrast, pointing – “ostension” — is based on shared directed attention. Pointing alone cannot become category naming as it is dependent on a shared line of gaze, and on what is immediately present at the time (“context”); it survives in language only with “deictic” words like here, this, now, me, which have no referent unless you are “there” too, to see what’s being pointed to!)
A proposition can be true or false. But pointing and miming cannot, because they are not proposing or predicating anything; just “look!”. Whatever is pointed to is what is pointed to, and whatever a gesture resembles, it resembles. Resemblance can be more or less exact, but it cannot be true or false; it cannot lie. (Flattering portraiture is still far away in these prelinguistic times, but it is an open question whether iconography began before or after language (and speech); so fantasy, too, may have preceded falsity. Copying and depicting is certainly compatible with miming; both static and dynamic media are iconic.)
It is not that pointing and miming, when used for intentional communication, cannot mislead or deceive. There are some examples in primate communication of actions done to intentionally deceive (such as de Waal’s famous case of a female chimpanzee who positions her body behind a barrier so the alpha male can only see her upper body while she invites a preferred lower-ranking male to mate with her below the alpha’s line of sight, knowing that the alpha male would attack them both if he saw that they were copulating).
But, in general, in the iconic world of gesture and pointing, seeing is believing and deliberate deception does not seem to have taken root within species. The broken wing dance of birds, to lure predators away from their young, is deliberate deception, but it is deception betweenspecies, and the disposition also has a genetic component.
Unlike snatch-and-run pilfering, which does occur within species, deliberate within-species deceptive communication (not to be confused with unconscious, involuntary deception, such as concealed ovulation or traits generated by “cheater genes”) is rare. Perhaps this is because there is little opportunity or call for deceptive communication within species that are social enough to have extensive social communication at all. (Cheaters are detectable and punishable in tribal settings — and human prelinguistic settings were surely tribal.) Perhaps social communication itself is more likely to develop in a cooperative rather than a competitive or agonistic context. Moreover, the least likely setting for deceptive communication is perhaps also the most likely setting for the emergence of language: within the family or extended family, where cooperative and collaborative interests prevail, including both food-sharing and knowledge-sharing.
(Infants and juveniles of all species learn by observing and imitating their parents and other adults; there seems to be little danger that adults are deliberately trying to fool them into imitating something that is wrong or maladaptive, What would be the advantage to adults in doing that?)
But in the case of linguistic communication – which means propositional communication – it is hard to imagine how it could have gotten off the ground at all unless propositions were initially true, and assumed and intended to be true, by default.
It is not that our species did not soon discover the rewards to be gained by lying! But that only became possible after the capacity and motivation for propositional communication had emerged, prevailed and been encoded in the human brain as the strong and unique predisposition it is in our species. Until then there was only pointing and miming, which, being non-propositional, cannot be true or false, even though it can in principle be used to deceive.
So I think the default hypothesis of truth was encoded in the brains of human speakers and hearers as an essential feature of the enormous adaptive advantage (indeed the nuclear power) of language in transmitting categories without the need for unsupervised or supervised learning, instantly, via “hearsay.” The only preconditions are that (1) the speaker must already know the features (and their names) that distinguish the members from the nonmembers of the new category, so they can be conveyed to the hearer in a proposition defining or describing the new category. And (2) the hearer must already know the features of the features (and their names) used to define the new category. (This is the origin of the symbol grounding problem and its solution).
The reason it is much more likely that propositional language emerged first in the gestural modality rather than the vocal one is that gestures’ iconicity (i.e., their similarity to the objects they are imitating) first connected them to their objects (which would eventually become their referents) and thereafter the gestures were free to become less and less iconic as the gestural community – jointly and gradually – simplified them to make communication faster and easier.
How do the speakers or hearers already know the features (all of which are, of course, categories too)? Well, either directly, from having learned them, the old, time-consuming, risky, impoverished way (through supervised and unsupervised learning from experience) or indirectly, from having learned them by hearsay, through propositions from one who already knows the category to one who does not. Needless to say, the human brain, with its genetically encoded propositional capacity, has a default predilection for learning categories by hearsay (and a laziness about testing them out through direct experience).
The consequence is a powerful default tendency to believe what we are told — to assume hearsay to be true. The trait can take the form of credulousness, gullibility, susceptibility to cult indoctrination, or even hypnotic susceptibility. Some of its roots are already there in unsupervised and supervised learning, in the form of Pavlovian conditioning as well as operant expectancies based on prior experience.
Specific expectations and associations can of course be extinguished by subsequent contrary experience: A diabetic’s hypoglycemic attack can be suppressed by merely tasting sugar, well before it could raise systemic glycemic level (or even by just tasting saccharine, which can never raise blood sugar at all). But repeatedly “fooling the system” that way, without following up with enough sugar to restore homeostatic levels, will extinguish this anticipatory reaction.
And, by the same token, we can and do learn to detect and disbelieve chronic liars. But the generic default assumption, expectation, and anticipatory physiological responses to verbal propositions remain strong with people and propositions in general – and in extreme cases they can even induce “hysterical” physiological responses, including hypnotic analgesia sufficient to allow surgical intervention without medication. And it can induce placebo effects as surely as it can induce conspiracy theories.
]]>