**Next message:**Stevan Harnad: "Re: George Miller [Magical Number 7 +/-2] Part 5"**Previous message:**catriona barrett: "George Miller [Magical Number 7 +/-2] Part 5"**Maybe in reply to:**jlh597@soton.ac.uk: "George Miller [Magical Number 7 +/-2] Part 1"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

*> From: jlh597@soton.ac.uk
*

*>
*

*> Part One of the Miller article on information limits:
*

*>
*

*> INFORMATION MEASUREMENT
*

*>
*

*> but I bet somewhere that there's a
*

*> bloke who cannot recall particular events or specific types of
*

*> information but still leads an ok life!)
*

Actually, there are several such people. The famous amnesic patient

"HM", probably the most-studied patient in the world, has been leading

an odd sort of life since his operation in 1954, when, among other

things, his hippocampus -- a pair of structures, each shaped like a C

but positioned as an upside down U along the length of the brain on

both sides, under the newer parts of the cortex, especially the

temporal lobes -- was removed on both sides. From that day onward HM

could not register any new memories. So every day was the first day

after the operation for him. In 1998 he is still saying, when people

ask him how he is getting on: "It's all starting to come back to me,"

whereas of course it's all being forgotten almost immediately after it

happens.

As I said in class, his father died several years after his operation,

and when he was told that his father had died, HM of course broke down

and cried. But the next day he had forgotten, so that when he was

reminded, he again broke down and cried as if it was the first he had

heard it. So of course they stopped telling him.

But was he not suspicious when he did not see or hear of his father for

so long? No, because as every day was the day after his operation, he

did not remember that he had not seen him for a long time; he felt that

he had seen him only a few days earlier.

So HM is content, in a way, being the most studied memory patient (even

though he does not remember it); but would any of us want to be in his

place? Is he leading "an ok life"?

*> Anyway... the way he tested this number was through the use of
*

*> "absolute judgement experiments" where they test how well participants
*

*> can "assign numbers to the magnitudes of various aspects of a
*

*> stimulus". In real English, they are testing the extent of people's
*

*> abilities to transmit information. It is also evident that the use of
*

*> the information theory has been widely applied to his own proposals.
*

Jo, you're beginning to talk like them! Since when is "transmit

information" real English! We spent two hours last week trying to find

out what information was, and what it meant to "transmit information."

We provisionally settled on the following: Information is whatever

serves to reduce your "uncertainty," when you have to make a choice

between a lot of alternatives and are unsure which is the right choice.

It MATTERS to you whether you make the right choice or not, because

there are consequences of making the right or wrong choice. For

example, if you make the right choice, you eat; if you make the wrong

choice, you starve.

So I suggested that you think of a sandwich machine with 6 buttons.

Every day, you get your lunch by pressing one of the buttons. One of

them produces a sandwich, and five do not, and it changes every day. So

you are uncertain about which button is the right one, and it matters,

because your lunch depends on it. We can even say HOW uncertain you are:

Your uncertainty is almost 100%: it's 5/6 (which is about 83%). This

means that, without any further INFORMATION, your chances of eating on

any day are 1/6 or 17%. Your chances of not eating are 5/6 or 83%.

Information would be something that reduced your uncertainty. For

example, if you were told, on a given day, that the number was odd

rather than even that day, you still would not be certain to get lunch,

but your uncertain would be reduced from 5/6 to 2/3 (or from 83% to

67%).

Whenever you get information, you get something that reduces your

uncertainty between alternatives that you are not sure about and that

matter to you. For example, my kid-sib definition of information reduced

your uncertainty about what to say on a future exam question that asks

you what "information" is. And the outcome of that exam matters to

you...

(I hope you have better reasons for wanting to understand information

than just to get a better mark. Let's say it's also to reduce your

uncertainty about how the mind works, and about what it is that we are

doing when we INFORM one another with our words (or our actions).)

*> The issue of information measurement is commonly understood in the
*

*> light of "VARIANCE" which simply refers to the "amount of information".
*

*>
*

*> > "If the variance is very small, we know in advance how our observation
*

*> > must come out, so we get little information from making the
*

*> > observation".
*

You should see this in the sandwich machine example: There the variance

is large: The sandwich, on any given day, can be connected to button 1,

2, 3, 4, 5, or 6, and every day it varies. The variance is 83%. But if

you learn that the number is odd on Tuesdays, then that means the

variance on Tuesdays is 67%, which is less that 83%.

Variance here is the same as the amount of uncertainty. But it's exactly

the same as the "variance" you learn about in statistics. I will now

show you exactly how they are the same thing:

Consider your daily weight. Suppose your average weight is 9 stone.

If you measure it across days and weeks and calculate the average, it

comes out to 9 stone. But on any day it will not be EXACTLY 9 stone.

It will sometimes be a bit higher and sometimes a bit lower. Let's say

it varies by +/- 3 pounds.

If you take your actual weight on any given day, and subtract your

average weight from it, you will get that day's "deviation" from the

average. Those deviations (or "residuals") can be either positive or

negative -- a little higher or a little lower. We want a way to say how

big the "average" deviation is. It's not exactly an average, but

similar to one.

First, since a deviation of +3 and a deviation of -3 is the same size

(although in the opposite direction), the first thing you need to do is

square the deviations to make them all positive: both +3 and -3 equal

+9 when they are squared.

Let's say we have calculated your average weight at 9 stone by

measuring it every day for 30 days, calculating the total for all 30

days, then averaging it by dividing the total by 30. That's how we got 9

stone.

Now we go back and calculate the daily deviations from that average,

and to keep them positive, we add up every day's sqaured deviation, and

find the total across the same 30 days. To find a kind of "average"

deviation, we need to divide the total squared deviations not by 30, as

we did when we were calculating your ordinary average, but by 29. The

reason we divide by 29 is that 29, not 30, is the amount by which the

squared deviation is "free to vary" (its "degrees of freedom").

The reason the daily deviations from your average weight across 30 days

have 29 rather than 30 degrees of freedom is that one degree of freedom

is "fixed" by calculating your average weight across those 30 days and

then using it to calculate your "average" deviations from that average

weight. The average weight has 30 degrees of freedom: on any given day

it can have any value, for each of the 30 days. But once you have

calculated it, and then go on to calculate each day's deviation from

that average, the deviation is free to take any value for the first 29

days, but the value for the 30th day is already fixed, because it has

to be the value that makes the average weight come out to 9 stone.

Now that sum of 30 squared deviations divided by the 29 degrees of

freedom is precisely what you have been taught is the VARIANCE of your

weight. (Its square root is the STANDARD deviation, which is what is

rather like an "average" deviation.)

Compare that to the sandwich machine that varies daily in which of the

six numbers gives you your lunch: Suppose someone else's lunch (not

yours) depends on their being able to guess your weight EXACTLY each

day (rounded off to the nearest pound). This means that any day you can

weigh 9 stone exactly, or 9 stone +1 pound, +2 pounds, +3 pounds or -1

pound, -2 pounds, or -3 pounds. If, as with the sandwich machine, all 7

of these possible deviations (0, +1, +2, +3, -1, -2, -3) were equally

likely, then that person's chance of lunch would be 1/7 and his

uncertainty would be 6/7, just as with the sandwich machine,

The only difference in the case of the variance of your weight is that

the 7 deviations are not equally likely. Their real likelihood is the

shape of the normal distribution: Small deviations are more likely

than big ones. In fact, the person is best guessing exactly 9 stone

each time, the exact midpoint of the distribution (i.e., 0 deviation),

because the rest of the possibilities will happen much less often,

especially the biggest deviations (+/- 3).

All this was just a refresher course on variance, to show you how

variance is related to information. Of course anything that reduces the

variance reduces the uncertainty.

*> This concept is then discussed with reference to a communication
*

*> system, which obviously involves both an input and an output. Everyone
*

*> knows this already, but the hard bit comes when you think about the
*

*> variance between the two stages, how it determines the whole
*

*> communication process, and also what actually happens between these two
*

*> stages. The in between bit he calls the "covariance of the input and
*

*> the output". Or in other words he calls the covariance the "amount of
*

*> transmitted information".
*

He is thinking of a signal that you sent, and then the signal someone

else, on the other end of the communication wire, receives. Suppose

when you are weighed, the scale is connected to a loudspeaker that makes

a sound that is proportional to your weight: The more you weigh, the

louder the sound. So a weight of exactly 9 stone is translated into a

loudness of, let's say 9 ells. ("Ell" for "L" for loudness.)

9 stone and a pound would be loudness of 1.1 ells, 9 stone minus 3

pounds would be a loudness of 8.7 ells, and so on.

Suppose you are transmitting the weights each day to a partner across

a wire in the form of loudnesses. The partner has to translate the

loudnesses back into weight and has to get it right in order to get

lunch. Suppose the wire distorts the loudnesses a little, so sometimes

the signal arrives a little louder or a little softer than the actual

weight.

What Miller means by COvariance is just the correlation between the

variance on one end and the other, so that if it is 9.2 on the sending

end, it should be 9.2 rather than 9.1 on the receiving end, and so on.

Correlation is easy to understand: it means high should go with high and

low with low. If the correlation is perfect (100%) then all the

information has been transmitted. If it is less than perfect, say, 90%,

then some information has been lost and replaced by noise (error).

If the correlation between the sent and received signals is 100%, then

the receiver eats lunch every time. By whatever percentage the

correlation is less than 100%, the receiver misses lunch. Whatever can

be done to increase the correlation, increases the amount of information

transmitted, and hence the probability of lunch.

*> Information measurement becomes more apparent when they perform these
*

*> absolute judgement experiment, where the aim is to gradually enhance
*

*> the amount of input information and then measure the amount of
*

*> transmitted information. The transmitted info is the part that they are
*

*> interested in as it shows the capacity of the subject's recall
*

*> abilities. The theory is that if the participant's absolute judgements
*

*> are fairly accurate, then the majority of the information will be
*

*> transmitted, thus retrievable from the person's responses. The way
*

*> they actually measure the level of absolute judgement is by observing
*

*> the amount of mistakes they make during the tasts.
*

It's exactly the same as the weight information being transmitted

through loudness along a wire.

I can even use loudness as the example. Remember I described how one

thing you could teach a subject to do would be to say "loud" when a

sound is loud and "soft" when it is soft. Subjects can divide a whole

range of loudnesses into these two simple categories making very few

mistakes. (The mistakes would just be for the sounds at the boundary

between loud and soft.) Subjects could do just as well dividing the

sounds into three categories (loud, medium, soft -- again with errors

only at the boundaries) and even four, five, six, seven categories.

But if you tried to push it further than that, your error rate would go

up, and not just at the boundaries. This is because only 7 +/- 2

categories of loudnesses can be identified. (Identification = naming =

absolute judgment.) That is our one-dimensional "channel capacity" no

matter what the dimension is (loudness, brightness, size, etc.).

The Miller paper is about how we manage to identify so many more things

despite our limited channel capacities. It all turns out to be based

on RECODING and adding more dimensions. I'll get to that later.

*> The results generally show that at first, their recall abilities are
*

*> quite good and will gradually increase, but after a while, the recall
*

*> capacity will "level off" so to speak, and it is this leveled bit that
*

*> they believe to be "the greatest amount of information that he can give
*

*> us about the stimulus on the basis of an absolute judgement". So, is
*

*> this whre he got this magical number from then?
*

Suppose you divide the loudnesses into 7 categories. For a while, you'll

get better at saying which is which, as you sharpen the boundaries through

practise. But if you try to add more categories, you'll just get more

errors. 7 +/- 2 is tops.

*> A final point to add is
*

*> that the fun and games of these experiments involve giving the
*

*> participant an increasing amount of alternative stimuli to observe,
*

*> instead of just increasing the amount of stimuli they produce.
*

What he meant was dividing the dimension into more than 7 regions.

*> This point is known as the "channel capacity" and is the point at which
*

*> confusion occur......
*

Seven or so is the channel capacity. Finer subdivisions only lead

to confusion.

*> End of intellectual comment!
*

*>
*

*> The last thing I wanna say is that at first, I found this bit of the
*

*> article confusing, but now that I have re-gurgitated it in my own
*

*> words, I actually think that I know what a cognitive psychologist is
*

*> going on about for a change!!!!
*

Have you ever noticed when you are reading about something or someone

is lecturing you about something, and none of it makes sense, when you

finally get it, you feel like saying: "Why didn't they say that in the

first place!" In other words, once you understand after a struggle, you

know ways that would have made the struggle unnecessary. THAT's the

point where you should weld in the understanding by explaining it to a

few other people, the direct way...

*> See you lot on Monday!
*

*>
*

*> Jo
*

*>
*

*> P.S. That was not a sarcastic comment by the way, before I get chucked
*

*> out of Uni!!!!!
*

Not at all. You're right. And it's not always obvious what cognitive

psychologists are on about.

Chrs, S

**Next message:**Stevan Harnad: "Re: George Miller [Magical Number 7 +/-2] Part 5"**Previous message:**catriona barrett: "George Miller [Magical Number 7 +/-2] Part 5"**Maybe in reply to:**jlh597@soton.ac.uk: "George Miller [Magical Number 7 +/-2] Part 1"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

*
This archive was generated by hypermail 2b30
: Tue Feb 13 2001 - 16:24:19 GMT
*