Re: George Miller [Magical Number 7 +/-2] Part 1

From: Stevan Harnad (harnad@cogsci.soton.ac.uk)
Date: Mon Mar 02 1998 - 17:25:27 GMT

Next message: Stevan Harnad: "Re: George Miller [Magical Number 7 +/-2] Part 5"
Previous message: catriona barrett: "George Miller [Magical Number 7 +/-2] Part 5"
Maybe in reply to: jlh597@soton.ac.uk: "George Miller [Magical Number 7 +/-2] Part 1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> From: jlh597@soton.ac.uk
>
> Part One of the Miller article on information limits:
>
> INFORMATION MEASUREMENT
>
> but I bet somewhere that there's a
> bloke who cannot recall particular events or specific types of
> information but still leads an ok life!)

Actually, there are several such people. The famous amnesic patient
"HM", probably the most-studied patient in the world, has been leading
an odd sort of life since his operation in 1954, when, among other
things, his hippocampus -- a pair of structures, each shaped like a C
but positioned as an upside down U along the length of the brain on
both sides, under the newer parts of the cortex, especially the
temporal lobes -- was removed on both sides. From that day onward HM
could not register any new memories. So every day was the first day
after the operation for him. In 1998 he is still saying, when people
ask him how he is getting on: "It's all starting to come back to me,"
whereas of course it's all being forgotten almost immediately after it
happens.

As I said in class, his father died several years after his operation,
and when he was told that his father had died, HM of course broke down
and cried. But the next day he had forgotten, so that when he was
reminded, he again broke down and cried as if it was the first he had
heard it. So of course they stopped telling him.

But was he not suspicious when he did not see or hear of his father for
so long? No, because as every day was the day after his operation, he
did not remember that he had not seen him for a long time; he felt that
he had seen him only a few days earlier.

So HM is content, in a way, being the most studied memory patient (even
though he does not remember it); but would any of us want to be in his
place? Is he leading "an ok life"?

> Anyway... the way he tested this number was through the use of
> "absolute judgement experiments" where they test how well participants
> can "assign numbers to the magnitudes of various aspects of a
> stimulus". In real English, they are testing the extent of people's
> abilities to transmit information. It is also evident that the use of
> the information theory has been widely applied to his own proposals.

Jo, you're beginning to talk like them! Since when is "transmit
information" real English! We spent two hours last week trying to find
out what information was, and what it meant to "transmit information."

We provisionally settled on the following: Information is whatever
serves to reduce your "uncertainty," when you have to make a choice
between a lot of alternatives and are unsure which is the right choice.
It MATTERS to you whether you make the right choice or not, because
there are consequences of making the right or wrong choice. For
example, if you make the right choice, you eat; if you make the wrong
choice, you starve.

So I suggested that you think of a sandwich machine with 6 buttons.
Every day, you get your lunch by pressing one of the buttons. One of
them produces a sandwich, and five do not, and it changes every day. So
you are uncertain about which button is the right one, and it matters,
because your lunch depends on it. We can even say HOW uncertain you are:
Your uncertainty is almost 100%: it's 5/6 (which is about 83%). This
means that, without any further INFORMATION, your chances of eating on
any day are 1/6 or 17%. Your chances of not eating are 5/6 or 83%.

Information would be something that reduced your uncertainty. For
example, if you were told, on a given day, that the number was odd
rather than even that day, you still would not be certain to get lunch,
but your uncertain would be reduced from 5/6 to 2/3 (or from 83% to
67%).

Whenever you get information, you get something that reduces your
uncertainty between alternatives that you are not sure about and that
matter to you. For example, my kid-sib definition of information reduced
your uncertainty about what to say on a future exam question that asks
you what "information" is. And the outcome of that exam matters to
you...

(I hope you have better reasons for wanting to understand information
than just to get a better mark. Let's say it's also to reduce your
uncertainty about how the mind works, and about what it is that we are
doing when we INFORM one another with our words (or our actions).)

> The issue of information measurement is commonly understood in the
> light of "VARIANCE" which simply refers to the "amount of information".
>
> > "If the variance is very small, we know in advance how our observation
> > must come out, so we get little information from making the
> > observation".

You should see this in the sandwich machine example: There the variance
is large: The sandwich, on any given day, can be connected to button 1,
2, 3, 4, 5, or 6, and every day it varies. The variance is 83%. But if
you learn that the number is odd on Tuesdays, then that means the
variance on Tuesdays is 67%, which is less that 83%.

Variance here is the same as the amount of uncertainty. But it's exactly
the same as the "variance" you learn about in statistics. I will now
show you exactly how they are the same thing:

Consider your daily weight. Suppose your average weight is 9 stone.
If you measure it across days and weeks and calculate the average, it
comes out to 9 stone. But on any day it will not be EXACTLY 9 stone.
It will sometimes be a bit higher and sometimes a bit lower. Let's say
it varies by +/- 3 pounds.

If you take your actual weight on any given day, and subtract your
average weight from it, you will get that day's "deviation" from the
average. Those deviations (or "residuals") can be either positive or
negative -- a little higher or a little lower. We want a way to say how
big the "average" deviation is. It's not exactly an average, but
similar to one.

First, since a deviation of +3 and a deviation of -3 is the same size
(although in the opposite direction), the first thing you need to do is
square the deviations to make them all positive: both +3 and -3 equal
+9 when they are squared.

Let's say we have calculated your average weight at 9 stone by
measuring it every day for 30 days, calculating the total for all 30
days, then averaging it by dividing the total by 30. That's how we got 9
stone.

Now we go back and calculate the daily deviations from that average,
and to keep them positive, we add up every day's sqaured deviation, and
find the total across the same 30 days. To find a kind of "average"
deviation, we need to divide the total squared deviations not by 30, as
we did when we were calculating your ordinary average, but by 29. The
reason we divide by 29 is that 29, not 30, is the amount by which the
squared deviation is "free to vary" (its "degrees of freedom").

The reason the daily deviations from your average weight across 30 days
have 29 rather than 30 degrees of freedom is that one degree of freedom
is "fixed" by calculating your average weight across those 30 days and
then using it to calculate your "average" deviations from that average
weight. The average weight has 30 degrees of freedom: on any given day
it can have any value, for each of the 30 days. But once you have
calculated it, and then go on to calculate each day's deviation from
that average, the deviation is free to take any value for the first 29
days, but the value for the 30th day is already fixed, because it has
to be the value that makes the average weight come out to 9 stone.

Now that sum of 30 squared deviations divided by the 29 degrees of
freedom is precisely what you have been taught is the VARIANCE of your
weight. (Its square root is the STANDARD deviation, which is what is
rather like an "average" deviation.)

Compare that to the sandwich machine that varies daily in which of the
six numbers gives you your lunch: Suppose someone else's lunch (not
yours) depends on their being able to guess your weight EXACTLY each
day (rounded off to the nearest pound). This means that any day you can
weigh 9 stone exactly, or 9 stone +1 pound, +2 pounds, +3 pounds or -1
pound, -2 pounds, or -3 pounds. If, as with the sandwich machine, all 7
of these possible deviations (0, +1, +2, +3, -1, -2, -3) were equally
likely, then that person's chance of lunch would be 1/7 and his
uncertainty would be 6/7, just as with the sandwich machine,

The only difference in the case of the variance of your weight is that
the 7 deviations are not equally likely. Their real likelihood is the
shape of the normal distribution: Small deviations are more likely
than big ones. In fact, the person is best guessing exactly 9 stone
each time, the exact midpoint of the distribution (i.e., 0 deviation),
because the rest of the possibilities will happen much less often,
especially the biggest deviations (+/- 3).

All this was just a refresher course on variance, to show you how
variance is related to information. Of course anything that reduces the
variance reduces the uncertainty.

> This concept is then discussed with reference to a communication
> system, which obviously involves both an input and an output. Everyone
> knows this already, but the hard bit comes when you think about the
> variance between the two stages, how it determines the whole
> communication process, and also what actually happens between these two
> stages. The in between bit he calls the "covariance of the input and
> the output". Or in other words he calls the covariance the "amount of
> transmitted information".

He is thinking of a signal that you sent, and then the signal someone
else, on the other end of the communication wire, receives. Suppose
when you are weighed, the scale is connected to a loudspeaker that makes
a sound that is proportional to your weight: The more you weigh, the
louder the sound. So a weight of exactly 9 stone is translated into a
loudness of, let's say 9 ells. ("Ell" for "L" for loudness.)
9 stone and a pound would be loudness of 1.1 ells, 9 stone minus 3
pounds would be a loudness of 8.7 ells, and so on.

Suppose you are transmitting the weights each day to a partner across
a wire in the form of loudnesses. The partner has to translate the
loudnesses back into weight and has to get it right in order to get
lunch. Suppose the wire distorts the loudnesses a little, so sometimes
the signal arrives a little louder or a little softer than the actual
weight.

What Miller means by COvariance is just the correlation between the
variance on one end and the other, so that if it is 9.2 on the sending
end, it should be 9.2 rather than 9.1 on the receiving end, and so on.

Correlation is easy to understand: it means high should go with high and
low with low. If the correlation is perfect (100%) then all the
information has been transmitted. If it is less than perfect, say, 90%,
then some information has been lost and replaced by noise (error).

If the correlation between the sent and received signals is 100%, then
the receiver eats lunch every time. By whatever percentage the
correlation is less than 100%, the receiver misses lunch. Whatever can
be done to increase the correlation, increases the amount of information
transmitted, and hence the probability of lunch.

> Information measurement becomes more apparent when they perform these
> absolute judgement experiment, where the aim is to gradually enhance
> the amount of input information and then measure the amount of
> transmitted information. The transmitted info is the part that they are
> interested in as it shows the capacity of the subject's recall
> abilities. The theory is that if the participant's absolute judgements
> are fairly accurate, then the majority of the information will be
> transmitted, thus retrievable from the person's responses. The way
> they actually measure the level of absolute judgement is by observing
> the amount of mistakes they make during the tasts.

It's exactly the same as the weight information being transmitted
through loudness along a wire.

I can even use loudness as the example. Remember I described how one
thing you could teach a subject to do would be to say "loud" when a
sound is loud and "soft" when it is soft. Subjects can divide a whole
range of loudnesses into these two simple categories making very few
mistakes. (The mistakes would just be for the sounds at the boundary
between loud and soft.) Subjects could do just as well dividing the
sounds into three categories (loud, medium, soft -- again with errors
only at the boundaries) and even four, five, six, seven categories.

But if you tried to push it further than that, your error rate would go
up, and not just at the boundaries. This is because only 7 +/- 2
categories of loudnesses can be identified. (Identification = naming =
absolute judgment.) That is our one-dimensional "channel capacity" no
matter what the dimension is (loudness, brightness, size, etc.).

The Miller paper is about how we manage to identify so many more things
despite our limited channel capacities. It all turns out to be based
on RECODING and adding more dimensions. I'll get to that later.

> The results generally show that at first, their recall abilities are
> quite good and will gradually increase, but after a while, the recall
> capacity will "level off" so to speak, and it is this leveled bit that
> they believe to be "the greatest amount of information that he can give
> us about the stimulus on the basis of an absolute judgement". So, is
> this whre he got this magical number from then?

Suppose you divide the loudnesses into 7 categories. For a while, you'll
get better at saying which is which, as you sharpen the boundaries through
practise. But if you try to add more categories, you'll just get more
errors. 7 +/- 2 is tops.

> A final point to add is
> that the fun and games of these experiments involve giving the
> participant an increasing amount of alternative stimuli to observe,
> instead of just increasing the amount of stimuli they produce.

What he meant was dividing the dimension into more than 7 regions.

> This point is known as the "channel capacity" and is the point at which
> confusion occur......

Seven or so is the channel capacity. Finer subdivisions only lead
to confusion.

> End of intellectual comment!
>
> The last thing I wanna say is that at first, I found this bit of the
> article confusing, but now that I have re-gurgitated it in my own
> words, I actually think that I know what a cognitive psychologist is
> going on about for a change!!!!

Have you ever noticed when you are reading about something or someone
is lecturing you about something, and none of it makes sense, when you
finally get it, you feel like saying: "Why didn't they say that in the
first place!" In other words, once you understand after a struggle, you
know ways that would have made the struggle unnecessary. THAT's the
point where you should weld in the understanding by explaining it to a
few other people, the direct way...

> See you lot on Monday!
>
> Jo
>
> P.S. That was not a sarcastic comment by the way, before I get chucked
> out of Uni!!!!!

Not at all. You're right. And it's not always obvious what cognitive
psychologists are on about.

Chrs, S

Next message: Stevan Harnad: "Re: George Miller [Magical Number 7 +/-2] Part 5"
Previous message: catriona barrett: "George Miller [Magical Number 7 +/-2] Part 5"
Maybe in reply to: jlh597@soton.ac.uk: "George Miller [Magical Number 7 +/-2] Part 1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Tue Feb 13 2001 - 16:24:19 GMT