News in English

Bayes’ Theorem

No one talked about Bayes’ theorem back in the 60s when I was in med school, but I was exposed to it in a great lecture explaining why there could be no perfect test for a rare disease.  At this point we all knew that no test was perfect, but that some were still pretty good.  So the lecturer said suppose you had a test that was 99% accurate in spotting a rare disease, say affecting 1 person in 1,000.  This meant that giving the test to 1,000 people would say that 10 people in that group had the disease, when probably only 1 did.

For any probability there is another.  Given the population above there are two, a .999 probability that any one taken from the population does not have the disease (call it H- for negative hypothesis) and a .001 probability that the subject has the disease (call it H+).

This has nothing to do with the evidence provided by the test: call it E+ if the test is positive, and E- if the test is negative.

So what you as a doc want to know is how reliable is the test.  Given E+ (positive test) what is the probability that you patient has the disease (H+).

This is exactly where Bayes’ theorem comes in.

It allows you to figure out P (H | E ), where P is probability, H is the hypothesis, and E is the evidence (the test).  It actually lets you figure out 4 probabilities P (H+ | E+), P (H+ | E-), P (H- | E+) and P (H-| E-).

Here is Bayes’ Theorem in all its glory

P (H | E) = [ P (H) x P (E | H ) ] / P ( E )

What happened to + and – ?   They just have to be the same on both sides of the equation

If you have P ( H+ | E-) on the left side, than the right side must also have H+ and E-

Well P  (H+) is easy — the probability of the hypothesis that the individual has the disease being true given the population — is .001

So what is  P ( E+   | H + ) ?  It’s the probability of the test being positive if the person has the disease, and we know that is .99

Lastly what is P (E+) ?  It’s the probability of getting a positive test.  There are actually two ways of getting it (a true positive when someone has the disease — that’s .99 times .001, and a false positive when someone doesn’t have the disease.  That’s .999 times .01

so (.99 * .001 ) + (.999 * .01) = .01098

So P (H | E ) = .001 * .99 / .01098 = [ P (H+) x P (E+ | H +) ] / P ( E )  = .090164

So even though your test is wonderful, and 99% accurate, the chance of a positive test being right (given the rarity of the disease in the population) is only 9%.

Bayes’ theorem wasn’t accepted or used much until the past 20 years.  Why?  Because the real world is different; we don’t really know just what the frequency of the disease in our population is so we must guess.  This is called a prior probability, and Bayes’ theorem gives us a way of taking evidence (imperfect though it is) into account.

This subjectivity drove classic statisticians nuts.  They are ‘frequentists’.  The probability of a coin being heads in an ideal world is .5, but they only way to tell if this is true is to measure the frequency of heads in a large number of flips.  Subjectivity has nothing to do with it.

However Bayes’ theorem and Bayesian statistics is used all the time in machine learning.

Читайте на 123ru.net