Re: Cangelosi/Harnad Symbols

From: Jelasity Mark (jelasity@amadea.inf.u-szeged.hu)
Date: Thu Dec 09 1999 - 22:36:06 GMT


Here is my reply to your comments.

I won't reply to those comments that are due to your linear reading (as
you mentioned in the first course). I will also ignore your coments
that simply say "correct". Before turning to the details, I would like
to make some things clear.

I think that I understand your intuitions about toil and theft, just
like I did before writing my first contribution. Your thursday night
lecture just confirmed me about this. What is more, I even like your
ideas about the role of language as providing an effective way to
ground (or to boost grounding of) new categories. I also know your
paper on grounding you cited.

My problem was (and still is) that the simulations presented in the
paper have nothing to do with you ideas. I mean the model used in this
paper. Maybe in other papers you could improve it to the level that it
became relevant, but it is not the question, as we are talking about
this paper.

I may be useful if I state my basic standpoint explicitly: (if you
disagree, no further discussion is meaningful) I believe that the devil
is in the details. Imagination is not the way one should prove that a
model is relevant. First the model should be understood in strict
mathematical terms: what is its structure, what are the basic
processes, what changes and why, what are the equivalent descriptions
of the same model. Interpretation should be in harmony with that
analysis.

This IS vague ("in harmony"), (this is "megragadhatatlan"...) but on my
scale this paper is a clear (negative) case. Well, yes, I have seen
many papers and books that are on the positive side, i.e. I'm not
paraniod, as you may think (e.g. R. Schiffrin's memory models, a number
of differential equations for modelling e.g. population dynamics,
several models from game theory (J. Maynard Smith is a good example)
etc.).

on terminology: I used the term "concept" as done in machine learning.
As I understand it, it is basicly the same as the psychologists' term
"category", i.e.
a subset of some domain (male chicken as chicken, red as color, etc.).

Now on to the details.

  jm> (a good piece of advice: eat, mark, and return doesn't not mean anything
  jm> like eat, mark, return. read them as a1, a2, a3. only motion is real.)
>
sh> I agree completely. This is a toy model with a small number of
sh> parameters. It is always best to "de-interpret" such a model in weighing
sh> it. On the other hand, if the toy model DOES capture the right
sh> properties of what it is trying to model, that is, if it is capable of
sh> scaling up to life-size, then the interpretation is justified.

I agree (i.e. IF).

sh> At the end of their
sh> life-cycles, the 20 foragers with the highest fitness in each
sh> generation are selected and allowed to reproduce by engendering
sh> 5 offspring each. The new population of 100 (20x5) newborns is
sh> subject to random mutation of their initial connection weights
sh> for the motor behavior, as well as for the actions and calls
sh> (thick arrows in Figure 2); in other words, there is neither
sh> any Lamarckian inheritance of learned weights nor any Baldwinian
sh> evolution of initial weights to set them closer to the final stage
sh> of the learning of 00, A0, 0B and AB categories. This selection
sh> cycle is repeated until the final generation.
>
  jm> Of course, there IS Baldwinian evolution, as Cangelosi admitted.
sh>
sh> Cangelosi not only admitted but affirmed that there is Baldwinian
sh> evolution in the model, but not in the initial weights!
sh>
sh> For others: Baldwinian evolution in learning is an effect in which it is
sh> not the learning itself that is inherited but the propensity to learn
sh> it. There was an inheritance of the propensity to learn in this model,
sh> but it was in the propensity to learn BY SYMBOLIC THEFT, not (as
sh> correctly noted above) in the propensity to learn BY SENSORIMOTOR TOIL.
sh>
sh> In other words, each successive generation was more genetically inclined
sh> to learn by theft instead of by toil, but they were not becoming more
sh> genetically inclined to know which mushrooms were edible and which were
sh> not! THAT they had to re-learn in each generation by toil.

This is a misunderstanding, you are talking about a different
experiment. I didn't even mention the competiton experiment you refer
to here. You may have used the same parameters for the genetic
algorithm in the competition experiment, but the comment does not
mention, that it refers only to the competition experiment.

I included the whole context to avoid such misunderstandings.
Here, the paper talks about the first kind of the experiments (or at
least it is VERY difficult to understand it otherwise), where either
only toil learning is performed, or toil and theft learning is done
separately. Here the pre-learning weights undergo the Baldwinian
effect, it is a plain fact. Here is Cangelosi's reply:

ac> ohps, you are right. It is true, as we wrote, that the Baldwin effect
ac> isn't selecting the perfect post-learning weights. Learning is STILL
ac> NEEDED to reach a good level of error (i.e. mushroom naming) but
ac> some Baldwinian effect is always present for selecting individuals
ac> with a "better" starting point than their great-great grand parents.
ac> At the end the pre-learning error tends to be lower than in the
ac> initial;generation, but without learning no good language is used.
ac> We should clarify it in the text. It still remains valid that
ac> learning is essential for learning a correct naming.

So, again, there IS Baldwinian effect. I didn't want to say more,
it is only a little correction anyway.

  jm> Now, my extended version of the model and experiments
  jm>
  jm> We have a population of 100 organisms.
  jm>
  jm> The learning procedure for every organism is the following:
  jm>
  jm> Every organism is put in a new world 20 times, in each they make 100
  jm> steps. In each step, the following is done:
  jm>
  jm> the closest mushroom is God and God teaches the
  jm> organism what to do (eat, mark), but not how to move around making
  jm> sure that the organism continues to do a random walk trough the space,
  jm> and does not teach "return" neither, because it will be used to test the
  jm> theft effect.
sh>
sh> Correct, but note that in the intended interpretation, it is not God who
sh> teaches these things, but the CONSEQUENCES of doing the right/wrong
sh> thing. If you eat a poison mushroom, you get sick; if you it an edible
sh> mushrooms, you get nourished.

Sure, but not important. Here mentioning "God" was not a criticism,
only style (though not very lucky).
(The same for your next comment, which I deleted).

  jm> Now, let's do some simplifications, that leave the predictive power of
  jm> the model intact.
  jm>
  jm> First observe, that the motion of an organism is
  jm> in fact a random walk, since motion is never taught, at least by
  jm> backprop. the genetic algorithm may have some effects, and it may well
  jm> be that fitness increases because organisms learn to approach mushrooms
  jm> more efficiently. However, this effect is irrelevant from the point of
  jm> view of learning concepts about mushrooms through toil. The
  jm> results of the paper would not change if motion, and the concept of
  jm> "the world of mushrooms" were discarded altogether.
  jm> Fitness could be calculated via any measure of learning accuracy over
  jm> an example set of different mushrooms.
sh>
sh> Correct, but irrelevant. We are talking about learning to sort
sh> mushrooms, not learning to walk or to approach.
sh>
  jm> Second, observe, that the call and action outputs are in fact identical.
  jm> The call output of the organisms are never used in any experiment.
  jm> The mysterious "imitation learning" phase seems to be useless, since
  jm> it teaches a function that is never used. The only possible effect of it
  jm> is that it somehow "clears the ground" for theft learning making use
  jm> of the fact that the action and call output is identical, so in the
  jm> theft learning the organism has to IMITATE the call input in its action
  jm> output. If this is right then it is cheating. If this is not right, then
  jm> the call output is useless. This means that the call output and the
  jm> imitation learning phase can be discarded.
sh>
sh> It is right, and it is not cheating. The experiment is not on imitation
sh> learning, which is not a problem in principle. It is about the relative
sh> value of the Toil vs. Theft strategy. Without the imitation learning
sh> phase there would be no detectable signal sent or received, so no
sh> hypothesis would be tested.

As I know from Cangelosi, in this paper the call signals in the theft
learning phase were generated by God, not other organisms (though I
know, that in other papers you did it that way to). However, either
way, the story about learning by the effects of action does not work
here for imitation learning. But it is still all right with me.

The important stuff comes here (I will concentrate only on the most
important points):

sh> Look: It was made clear that the model is a toy model, with too few
sh> parameters to bear the weight of a realistic ecological interpretation.
sh> Nevertheless it did test the relative success of learning to categorize
sh> by two radically different means -- one slow, one fast; one direct, one
sh> indirect; one sensorimotor, one symbolic. If the toy model captures
sh> realistic variables, and if the two strategies do indeed capture the
sh> relative effectiveness of the prelinguistic and the linguistic way of
sh> acquiring new categories, then the rapid dominance of the one strategy
sh> over the other is a possible explanation of the adaptive advantage of
sh> language.
sh>
  jm> Third, evolution and learning both evolve the very same weights of the
  jm> organism. The combination of evolution and backprop is virtually a
  jm> single learning algorithm that has to find good weights for the given
  jm> task (the genetic algorithm is typically used to find structure, not
  jm> weights, in which case this is not true). So we can think of the model
  jm> as containing only one organism, being taught by some algorithm
  jm> based on a set of learning examples.
sh>
sh> Toy models can always be interpreted many different ways; it is not
sh> particularly informative to show that other interpretations are possible.

This is not another interpretation, this is an equivalent abstract
structure. Anyway, I disagree with your claim, that looking at
equivalent structures are uninteresting, especially if they are
simpler.

  jm> Here, "theft" organisms learn return based on the call, and "toil"
  jm> organisms learn based on the mushroom. This means that "theft"
  jm> organisms receive the very same input as toil organisms, except
  jm> they don't receive garbage (C,D,E features).
sh>
sh> Not interesting. The "garbage" was there to make the learning less
sh> trivial on OUR interpretation.

To the contrary: it is the most interesting observation. A key element
of the theory is grounding transfer:

ac> 3) GROUNDING TRANSFER. It is it important to say that theft organisms
ac> actually recognise a Return mushroom (1) from the call that describes
ac> it (as expected after the explicit backprop learning) AND ALSO (2)
ac> when they see its features (for a study of the "grounding transfer"
ac> phenomenon, please see also the paper Cangelosi-Greco-Harnad). The
ac> learning of the call and behaviour Return will also ground them in
ac> the perceptual categories.

Without this effect, theft learning of RETURN would be grounded in the
calls. This means that though the organism have EAT and MARK grounded,
and RETURN depends only on these two, the organism had no way to tell
this relationship. It would have three independent concepts, two
grounded in perceptual input, one in calls. To tell the relationship,
RETURN should have an explicit logical structure referring to the names
of eat and toil and rules on how to apply the logical structure in
decision tasks. Neither is present in the model. RETURN has no logical
structure, it is learnt via toil in the domain of calls. It is faster,
because there are less features (3 versus 5), furthermore all three are
relevant in the case of calls, while with perceptual input, C,D and E
are irrelevant, i.e. "garbage".

If there is grounding transfer, that may be only due to the similar
structure of calls and perceptual features. With using arbitrary
"words" that are not correlated with perceptual input (as we have it in
natural languages: the word "zebra" has no strips) I suspect the
grounding transfer would disappear making the whole approach
irrelevant.

And I haven't mentioned the catastrophic forgetting effect, which means
that after too much theft learning the weights from the hidden layer to
the output layer can change so much that EAT and MARK can be forgotten
altogeather.

I also have to mention that the only relevant response to this
criticism is to prove that with arbitrary words that have no structural
correlation neither with perceptual input nor with action output there
is still significant grounding transfer. It seems to be a mathematical
miracle, however. I'm am very much interested in the paper Cangelosi
refers to.

One more thing: I can't remember reading about grounding transfer in the
paper, though it may be my fault.

  jm> This means that the model only proves, that learning is more effective
  jm> without garbage in the input examples.
sh>
sh> Nothing of the sort.
sh>
sh> I like criticism, but criticism is usually more useful if it is based on
sh> the "charity assumption," which is that if there is a way to interpret
sh> what someone is saying in such a way that, if that is what they meant,
sh> then they must be rather stupid, then maybe I should try another
sh> interpretation. Only if no more charitable interpretation is possible
should I assume that my uncharitable one is the right one...

Again, it is not an interpretation, it is a fact that follows from the
formal structure of the model. Any other interpretation is an
over-interpretation which is not justified.

sh> 8. Conclusions
sh>
sh> We have shown that a strategy of acquiring new categories by
sh> Symbolic Theft completely outperforms a strategy of acquiring them
sh> by Sensorimotor Toil as long as it is grounded in categories
sh> acquired by Toil.
  jm>
  jm> If theft means learning without garbage, yes.
sh>
sh> It does not mean garbage learning, so try again.

What do you mean on "garbage learning"?

  jm> Instead of relying on the call
  jm> input, a third strategy could be to use the organisms own
  jm> output as input, i.e. to base the learning of new categories
  jm> on old ones. It would provide the same advantage, and indeed it
  jm> does. The frog's eye recognises concepts connected to size and motion,
  jm> and his concept "eat" depends on these primitive ones, forming
  jm> a hierarchy.
sh>
sh> We are not talking about "concepts" here (whatever that means), but
sh> about learning behavioral categories: What kinds of things can I eat?
sh> I can only find this out by trial and error, and feedback from the
sh> consequences of what I tried, if I made an error. Without the external
sh> feedback, there is no way to know right from wrong.
sh>
sh> This applies equally to the ground-level learning of eat/mark by toil,
sh> and to the higher-level learning of return by theft. It is not my own
sh> output from y input that will tell me whether I am right/wrong in either
sh> case; it is the feedback from the external consequences of my output.
sh>
sh> For others: This recommendation was motivated by jm's preference for
sh> "unsupervised" learning over "supervised" learning. But here it is the
sh> TASK itself that is essentially a supervised one. Just given mushrooms as
sh> input I am incapable of determining which are and are not edible; only
sh> the feedback from the consequences of eating them can guide me in that.
sh> The mushrooms have 5 features, A, B, C, D and E, but just giving me all
sh> the mushrooms over and over will never reveal that is is only the A
sh> mushrooms that are edible.

For others (too): I don't prefer unsupervised learning. Every kind of
learning has its place. I only said here, that once you have some
categories grounded, then you can use them as input to learning higher
level categories. This is also easier than pure honest toil, it is a
sort of self-theft, though the "name" of the target category is of
course missing. The type of learning is irrelevant.

  jm> Though the frog doesn't have names for its concepts, neither do the
  jm> foragers. Or in the other direction, if the foragers do, then frogs
  jm> do as well.
sh>
sh> When the only available strategy is toil (as for the frog, and for the
sh> prelinguistic human), there is no point naming or vocalizing. The
sh> utility of naming and vocalization begins with the possibility and
sh> utility of theft.
sh>
sh> By the way, the frog's categories are inborn, not learned, though
sh> experience activates and perhaps fine-tunes them. For this, mere
sh> exposure, without feedback, may be enough. That is why an unsupervised
sh> model could do it. But the task of our foragers cannot be solved that
sh> way.

I only said that there is nothing in the model that could be called a
"name" in any sese which resembles a linguistic definition. It is not a
key point however, you're right, we can't expect such things from a
simple model like this.

  jm> The second objection is the other side of the first one: if the
  jm> concept return depended on C,D or E (why not?), then the theft strategy
  jm> would be sentenced to failure.
sh>
sh> Correct, but that is because if the RETURN category depended on anything
sh> but a boolean combination of already grounded categories (and hence their
sh> underlying features) then it could not be learned by theft.

Well, in your model, it could be learned very well, since theft
learning itself does not depend on toil-learned categories in any way,
excpet the questionable grounding transfer.

sh> Just as you learn nothing if I tell you that a "snark" is a "boojum"
sh> that is "wrent" -- if you do not know what "boojum" and "wrent" mean.

Yes, this is why your model doesn't work. (Recall, theft learning is
grounded in calls). By the way, I wouldn't say I learned nothing.

[I cut some things, they are covered above]

  jm> In other words, we can see the world AND hear the names of things.
  jm> In my view, theft is done the following way. We hear a new name first,
  jm> and AFTER THAT we figure out how to ground it in PERCEPTUAL INPUT and
  jm> OUR OWN old concepts.
sh>
sh> I'm not sure what you have in mind. But here's a new name: "ban-ma" and
sh> if you go to a zoo, you will find some. Now go figure out how to ground
sh> it.

I don't remind you to the charity assumption, because it would not be
polite, so I'll explain instead, what would I do.

I'd go to the zoo, and I'd took a good look at the animal which has
"ban-ma" written on its cage. Language works like God, who provides
names and helps us learn to ground them from others who already know
their meaning.

  jm> The basic intuition of the paper is interesting but the model is
  jm> not relevant to language evolution.

Still think so. (I mean the mushroom model in the form presented in the
paper (to avoid misunderstandings).)

sh> I'm afraid that you have not quite grasped either the model or the basic
sh> intuition -- unless I have misinterpreted your comments...

You can decide...



This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:23:06 GMT