Consider the case of the teacher who asks a
"checking" question to see whether some student has grasped a
principle. If the answer is
satisfactory, the teacher may either think "Good, I thought X understood
that", or "Funny, I didn't think X had caught onto that one: I'd
better ask another question."
On the other hand, if the answer is unsatisfactory, the
teacher may think "Yes, I thought
X didn't know that", or "Funny, I could've sworn X understood that:
I'd better ask again in another way".
In either case, the teacher has taken a prior probability
into account, and used the new information to modify that probability to come
up with a posterior probability. The
probability that we are dealing with is a continuous variable.
Yet if we were to rely on classical probability theory,
the response to a single question can only yield one of two completely
discontinuous values. Either we are 100%
certain that the student has understood the business under consideration, or we
are equally 100% certain that the student has no clue at all.
The possibility that the student guessed the answer, or
made a silly mistake while really understanding the principle, or heard the
answer whispered by somebody, all of these are rejected in favour of a narrow,
rigid black-and-white view of probability.
Well, you don't have to be a teacher to recognise that
this is daft, but let me make it even easier for you. Suppose we have just used a multiple choice
question to assess where the student is at in terms of understanding the
principle involved.
Now people who have little real understanding of
probability reject these questions on the grounds that you might just get lucky
and guess all the answers. Let me assure
you here and now that this isn't possible for any large number of well-written
questions, but that isn't what I want to debate right now.
With a single question, offering four choices, however,
there is a reasonable chance of guessing with no understanding at all, a 25%
chance, in fact. Anybody who thinks
about it can recognise that: what is less obvious is that some students who do
know and understand the principle involved will make a clumsy mistake, and get
their answer wrong.
Anybody who knows about testing knows for a fact that
this happens. In the pressure of a test or exam, students enter the results in
the wrong place or do something else silly.
Sometimes, the fault lies in the question, which is badly worded.
So it's crazy to go around assuming that we can assert
100% probabilities about anything. We
teachers can, however, assert that we are pretty certain that somebody
understands whatever the principle is, to the extent that we are willing to
move on to something new, and teach that.
It's easy when we are talking about something simple like
teaching sums, or dates of famous battles, any sort of rote learning: even
Blind Freddy can see that we have to be flexible in how we calculate the
probability of something.
Now let's turn to something far more important: the
batting performance of our nation's cricketers.
Once again, I will start off with a simple and easy exercise: the
average performance of Sir Donald Bradman in test matches.
If we look at The Don's last score, he was out for a
duck. Should that be his lasting
record? Of course not! And if we look at a modern-day batter with a
sequence of low scores in the past few months, should we write him or her
off? Tabloid journalists call for the
executioner, sager minds look at the longer term.
Every measurement involves elements of chance, and even a
consummate wielder of the willow (that's the bat, for heathens) will sometimes "blow
it", sometimes several times in a row, and wise selectors usually look at
prior performance, or as mathematicians say, prior probability.
The technical stuff
You don't need to read this: the maths-free description will
do for most readers. In what follows, if
you do read it, the numbers in a1, a2 etc. ought to be subscripts, but this
blog does not support that, so far as I can see. This needs to be kept in mind when you see
things like an and p(an). Sorry!
Suppose we have a set of discrete alternatives a1, a2,
a3, a4 . . . an, for a given set of trials, and that we can write the
probability of a1 as p(a1). To make this easier, suppose we are looking at a
set of test scores, and the probability that a student has mastered the skill
being assessed in the test, which has twenty questions. The alternatives are
the test scores, from 0 to 20, and what we need to assess is the probability
that the student is a master of that skill, given a particular score.
Beginning with a prior estimate of probability p(a1|b),
the probability of a particular score being obtained by a student who has
mastered the skill, we can then use a simple formula to estimate the probability
that a particular student has mastered the skill, given that student's score:
p(ai|b) = [p(b|ai) x p(ai)] / [p(b|a1) x p(a1) + p(b|a2)
x p(a2) . . . . + p(b|an) x p(an)]
Or we may take this form of the equation, where there are
two events, A and B:
p(A|B) = [p(B|A) x p(A)] / [p(B|A) x p(A) + p(B|~A) x
p(~A)
Here we may define event A as 'mastery' and event B as a
particular score, or we may look at them in terms of the likelihood of guilt in
a particular situation, or almost anything else. If we only have a limited
amount of information available, or a limited number of data points, this will
tend to give us a better average estimate of the true situation.
To take a simple example, if four experimenters are
trying to find out what the frequency of heads and tails should be when you
toss two coins, there are four possibilities, which would give results of two
heads, two tails, a head followed by a tail or a tail followed by a head.
Now suppose we take a Bayesian approach, beginning with
the reasonable assumption that there should be a 'half and half' chance for
heads and tails. Under the same conditions, the head-head and tails-tails
observations will lead to a conclusion that the coin is biased to a particular
result, rather than suggesting that the same result will always be achieved.
The head-tail and tail-head cases will still lead to the conclusion that there
is an equal probability of getting heads or tails, so the overall set of
results is more accurate.
The examples here tend to relate to educational settings,
simply because the writer devoted two years of his life to researching and
developing such applications, but the same reasoning can just as easily be
applied to estimating baseball batting averages, or almost any other measure
which is mathematically equivalent to a probability.
Uses in the courts (still technical!)
You can skim this one as well, or leave it for now and come
back to it, as it is a side-issue. The main thing to note is that Bayesian
statistics have many uses.
Bayesian probability has now become important in the law
courts of the world, where it provides the most appropriate way of dealing with
DNA evidence or blood grouping. This need arises because of a peculiar
situation that arises when a lawyer says something like "there is a one in
a hundred thousand chance of somebody else having this DNA profile".
Suppose a random citizen (we will call him Fred) has been
accused on the basis of a blood spot left behind at a murder, which matches
Fred's profile to a level that the prosecution are calling "one in a
hundred thousand". Further, they say, somebody of Fred's racial group was
seen leaving the area. That makes the odds even better, because his race are
just 10% of the population. "That makes it one in a million", says
the prosecutor.
In fact, it brings the probability down, not up, says the
defence lawyer, who has read up on this topic. The defence may well be correct,
if they can show that most of the people of that DNA profile type are in AB's
racial group, so what we need to do is use a Bayesian probability, but this
example gets a bit confusing.
So let us look at a case where paternity has been
alleged, and DNA evidence seems to support the claim. Once again, the frequency
of that DNA type in a small community can be quite different to what you get in
the whole nation, so we do a calculation of the probability of the accused
being the father of the child at the centre of the case.
We have, from all of the tests, a Combined Paternity
Index, (CPI). This is calculated as the product of the paternity indices for
each individual system tested. The CPI tells us how likely it is that the
alleged father (or a man genetically identical to the alleged father)
contributed the paternal genes to the child, divided by the likelihood of another
unrelated man of the same race contributing the paternal genes.
As well, we have a Prior Probability (Pr). This is a
numerical value in the range 0-1 (that is, ranging from impossibility to total
certainty) which indicates the likelihood of a certain event occurring. This
value is estimated, before genetic testing, on the basis of known, non-genetic
circumstances surrounding the event.
That means taking into account non-statistical evidence,
such as casual acquaintance versus an intimate relationship. Since the
laboratory does not know of the existence or the substance of these
circumstances, a prior probability of 0.5 is customarily assigned for the
purpose of neutrality, but this can be varied.
Now we can calculate the probability of paternity:
P = (CPI) (Pr) / (CPI) (Pr) + (1-Pr), where P = Posterior
Probability of Paternity, CPI = Combined Paternity Index and Pr = Prior
Probability.
Uses against spam
The common methods of filtering spam, back in 2003, such as
rejecting mail from known spammers (black lists), and only accepting mail from
friends and colleagues (white lists), were not enough. Merely filtering known
spam messages was always one step behind clever spammers. More aggressive
filtering posed an unacceptable risk of killing legitimate messages.
Take a simple trap that rejected e-mails mentioning the
word 'Viagra' in the subject line: the word 'V1AGRA' will pass straight
through, but in nine out of ten cases, it will still be read by a human as
'VIAGRA'. New filtering methods were brought in to analyze e-mail messages in
their entirety, instead of just looking at a handful of key words.
These filters (and we still use them) make sophisticated
models, based on probability and statistics theory going back to the ideas of
the 18th-century mathematician and cleric, Thomas Bayes, that determine whether
new messages are spam or not.
Such a system allows a message about sextants which
mentions that the pen is mightier than the sword will be examined and passed,
rather than being examined and hurled into the outer darkness.
Rhe basic notion of Bayesian statistics is that it begins
with a certain assumed probability that a message should be rejected, and then
uses a variety of observations to adjust that probability, only acting if the
probability rises above (or falls below) a certain level.
Some findings may increase the probability, others may
reduce the probability, and in more sophisticated forms, testing may be exited
fast by the use of white lists and black lists, while indeterminate messages
can be given a more thorough scrutiny, even to looking for any of a few
thousand terms and phrases, all of the usual weasel claims about mail not being
sent unless people have opted in.
By the same token, the name of a known sender might be
used to validate e-mail, so that a message about e-mail from the New England Journal of Medicine or Nature, for example, would be allowed to
pass, even if it mentioned a number of otherwise 'black mark' terms. It would
also get around the problem encountered by some Thai people, whose names end in
'-porn', leading to all sorts of problems, and in the past, words like
Middlesex and Essex have been known to trigger poorly designed guardian
software.
Which is why we now rarely see this once-popular tagline
on e-mails:
When they come for the anarchists, I shall speak up even though I am not a anarchist. When they come for the Jews, I shall speak up even though I am not a Jew. When they come for the Muslims, I shall speak up even though I am not a Muslim. When they come for the Christians, I shall speak up even though I am not a Christian. When they come for the spammers, I'll say "You missed one over there!"
No comments:
Post a Comment