Re: Cangelosi/Harnad Symbols

From: Stevan Harnad (harnad@coglit.ecs.soton.ac.uk)
Date: Thu Dec 30 1999 - 22:02:21 GMT


On Mon, 27 Dec 1999, Jelasity Mark wrote:

> Though its relation with the development of language (and in particular
> with Harnad's theory) is still an open question to me, this paper poses
> a nontrivial and interesting research topic: what are the conditions
> that allow grounding transfer in this family of models?

Harnad's theory of language origin IS a theory of grounding transfer!
But do not put too much weighton this particular family of models.
What they do and do not manage to encode is a less fundamental matter
than the general principle of learning sensorimotor features that allow
categories to be acquired (toil) and then acquiring new categories by
combining category nams into propositions about new categories (theft).

> names should trigger the same inner representation (here: hidden unit
> activation) than perceptual input. The conditions of this condition are
> still non-trivial though, since the space of hidden unit
> representations can have a real difficult structure especially if there
> are many hidden units (in this paper there are 55 input units, but only
> 5 hidden units).

These particular nets (feedforward, backprop) with these inputs and
these hidden units are not the real focus of either paper. It is the
principle of grounding transfer through symbol strings.

> jm> I only said here, that once you have some
> jm> categories grounded, then you can use them as input to learning higher
> jm> level categories. This is also easier than pure honest toil, it is a
> jm> sort of self-theft, though the "name" of the target category is of
> jm> course missing. The type of learning is irrelevant.
> sh>
> sh> I'm not sure what you mean. You can't learn a new category by talking to
> sh> yourself (before you have language!). If you are thinking of reasoning,
> sh> it is a little premature in this model. Same for Vygotskyian inner
> sh> speech. We need to get to outer speech before we can get to inner
> sh> speech....
>
> The process I'm referring to does not even have to be conscious. The
> categories learnt via toil "are there". I mean (again a further theory
> of mine) that they have to have some kind of localist representation
> (not necessarily physically localist), just like the neurons in the
> retina firing in the presence of certain shapes, and these locally
> represented categories can serve as input to further learning (via
> toil). Higher level (but not linguistic, only "reverse engineered")
> categories do not receive the most basic input directly; they may be
> based on lower level categories.

Sounds like you have a rival theory in mind here; can't reply till we
know what it is and does. Not what localist but not physically localist
means. (Localist = non-distributed; what would "non-distributed but not
PHYSICALLY non-distributed mean?).

Retinal neurons fire, but they do not recognize objects, nor do they
learn. Higher-level feature-detectors recognize objects, and we are
interested in the kind that LEARN to recognize objects.

"Represents" is a weasel-word; something internal has to learn to
categorize subsets of paterns as being of the same KIND. It is
irreleant whether the process is conscious or unconscious, but it is
definitely relevant that the process must be one of LEARNING. One must
learn which patterns are of the same kind. That learning has to be
based on feedback indicating when we are right and when we are wrong.
That feedback must come from the outside, not the inside -- for if it
came from the inside, then, as with Chomsky's UG, we would know it
already; we would not be learning it.

("Reverse-engineering" is what the theorist does, in trying to figure
out what the brain does, and how; it is not what the brain does --
except of course the brain of the theorist!)

What makes something linguistic in my theory is not that it is the name
of a category, nor that it is a higher-level rather than a lower-level
category (all categories, being abstractions, are "higher" -- the only
difference is in how high), but that it is a category "defined" by a
string of symbols (grounded category names) expressing a proposition.

(I don't know your model, but it continues to sound to me like some form
of unsupervised learning model, whereas what is needed is learning
guided by external feedback for tasks like the ones at issue here.)

> jm> I'd go to the zoo, and I'd took a good look at the animal which has
> jm> "ban-ma" written on its cage. Language works like God, who provides
> jm> names and helps us learn to ground them from others who already know
> jm> their meaning.
> sh>
> sh> Ah, this is similar to your point about the possibility of learning a
> sh> category from positive instances alone (and suffers from the same sort
> sh> of problem, namely, that in nontrivial cases it is impossible).
> sh>
> sh> Yes, If all members of a category, besides their critical features
> sh> (which are hard to find, and normally need to be learnt by honest toil)
> sh> also wore their names on their sleeves, then categorization would be a
> sh> lot easier (indeed, you would not have to worry about figuring out
> sh> features at all). Indeed this is precisely what cheating is.
> sh>
> sh> No, in a nontrivial category learning task, my telling you that THIS is
> sh> a ban-ma would do you next to no good in deciding whether or not the
> sh> next candidate was a ban-ma too (just as eating one mushroom, and
> sh> [say] not getting sick, does not thereby make me capable of
> sh> distinguishing the edible from the inedible mushrooms). It's not just a
> sh> matter of being given a positive instance and contemplating it till its
> sh> critical features leap out at you. The critical features are detected
> sh> by trial and error, from sampling many positive and negative
> sh> instances (with the help of an internal implicit learning device --
> sh> possibly a neural net -- that is good at doing just that).
>
> The role of the examples depends on the learning algorithm. There is a
> mathematical finding, that is relevant here: every learning algorithm
> has a BIAS (including backprop on NNs). Otherwise any generalization to
> unseen examples is impossible. The bias means that the learner prefers
> some categories A PRIORI. If this bias is correct, then the learning
> algorithm is successful, otherwise it isn't.

There may be a way of viewing every successful learning device for any
particular family of patterns has having a "bias" for acquiring those
patterns, relative to unsuccessful learning devices, incapable of
learning that particular family of patterns. But now let us be more
general, and ambitious. Let us think of a learning device that is
capable of learning all the categories in any dictionary (and
encyclopedia), present and future. Yes, the one(s) that can do it can be
said to have a "bias" toward learning those categories, but it is a
pretty general bias! And it must scale to a pretty general set of
categories!

So, again, unless you are Jerry Fodor, who believes the categories all
have to built in in advance (in which case the "bias" is inherent in
the Big Bang, and the device is not really a learning device at all,
but just a Platonic "remembering" device) -- unless you believe that,
the "bias" merely means a device that has (at least) our own category
learning capacities. What that "bias" is will still take a lot more
reverse-engineering to determine (I doubt that any learning models can
do what we can do just yet), but it will have to include the capacity to
learn categories from positive and negative instances, where positive
instances alone would simply be indeterminate.

[Note, any set of N patterns can be sorted (= categorized) in N! ways
(the case is even harder for an infinite number of patterns). Mother
Nature (mushrooms) -- or our dictionary/encylopedia/society -- might
decree that only one of those ways happens to be correct. The general
learning device must always be able to find the right way, if we can.
Positive instances alone cannot be enough to find it -- that is true BY
DEFINITION in this general, nontrivial case. And, a fortiori, ONE
positive instance is not. (Beware of multiple positive instances,
because they begin to become negative instances of one another, as soon
as more than one category is involved; a category's complement is
already another category.)]

> If the bias is such that specific categories (i.e. representing a small
> subset) are preferred, then usually only positive examples are
> interesting.
> If the bias is towards general concepts, then only negative examples
> are interesting.

I don't know what "interesting" means, but could you please scale this
up to te general dictionary/encyclopedic learning scale I just
mentioned? And don't forget to account for all the potential
permutations and combinations that just happen to be WRONG...

> I don't understand exactly what you mean on "trivial" tasks, but the
> role of positive examples completely depends on the nature of
> categories to be learned i.e. the correct bias for the (maybe abstract)
> domain. I don't see why domains where only positive examples are needed
> should necessarily be more trivial then other domains, whatever that
> means.

I hope I have given you more general food for thought on that score
now. (What we are discussing here is the nontriviality of
UNDERDETERMINED categories -- which, assuming the Big-Bang theory is
false, is precisely what our encyclopedic category repertoire, present
and future, implicates.)

--------------------------------------------------------------------
Stevan Harnad harnad@cogsci.soton.ac.uk
Professor of Cognitive Science harnad@princeton.edu
Department of Electronics and phone: +44 23-80 592-582
Computer Science fax: +44 23-80 592-865
University of Southampton http://www.cogsci.soton.ac.uk/~harnad/
Highfield, Southampton http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM



This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:23:07 GMT