It's finally been cropping up at work, and, like all other blogs that I follow, none of them are complete without an explanation of Bayes' Theorem.

P(A|B) = P(B|A)P(A) / P(B)

(The probability of A given B is equal to the probability of B given A times the probability of A over the probability of B.)

Let's get into it.


We have two independent events, A and B (and their opposites, A (not A) and B (not B)). In a population (the black-bordered box), individual samples can either have neither A nor B (A∪B, in pure white), just A (red), just B (blue), or both A and B (A∩B, in purple).

A∪B A B A∩B

The probability of an event (P(A∪B), P(A), P(B), or P(A∩B)) is the number of occurrences of the event (the area of the box) divided by the population (the area of the black bordered box). So, the chance that A happens is the same chance of throwing a dart and hitting the red box, assuming you hit the board at all.

The probability of A given B (P(A|B)) is the probability that A also took place, assuming you know the event B has taken place. Since you know that your dart hit somewhere in the blue region, your board now looks like this:

B A∩B

Thus, P(A|B) = P(A∩B) / P(B).

But what is P(A∩B)? By swapping A with B, we get

P(B|A) = P(A∩B) / P(A) P(B|A)P(A) = P(A∩B)

and thus, subbing into the first equation:

P(A|B) = P(A∩B) / P(B) P(A|B) = P(B|A)P(A) / P(B)

And here we get Bayes' Theorem.

Congratulations, here's your degree, have a nice day.


But what does it actually mean? What is it actually used for?

Well, Bayes's theorem is especially important for updating prior probabilities in light of new evidence. In real life, not all evidence is starkly black and white; if you have a video of the murder taking place with a clear picture of the perpetrator's face, you might be able to convict without the Bayes, but if you only have their fingerprints on the murder weapon, well, you might have to apply the theorem.

Sammy the Chef is accused of murder of the waiter. With no evidence, the chance that she's the murderer is the same as anyone else (same as, say, John the Grocer). But with the presentation of new evidence, the presence of Sammy's fingerprints on murder weapon, a butcher knife, Sammy's innocence is called into question.

So what's the chance that Sammy is actually guilty? What's the probability that she's guilty, given that her fingerprints were all over the knife?

Let A be Sammy being guilty, and B be Sammy's fingerprints being on the knife.

Let's work through the variables one by one.

P(A|B) is the probability that Sammy is guilty, given the evidence, which is what we're trying to find.

P(B|A) is the probability that Sammy's fingerprints are on the murder weapon, given that she's guilty. If Sammy's guilty, of course her fingerprints would be all over the murder weapon! She used it to murder someone! We're almost certain that, if she was guilty, her fingerprints would be on the murder weapon. (100%)

P(A) is the chance that Sammy is guilty, before we factor in the evidence. This is usually the most difficult probability to acquire, because it requires some prior information. If the waiter was murdered in the middle of Times Square, everyone in New York (and farther, besides), could be a murder suspect, and P(A) would be very low, but if this was a locked-room mystery, then the number of suspects would drop drastically.

P(B) is the chance that Sammy's fingerprints are on the butcher knife. Of course Sammy's fingerprints would be on the butcher knife! She's the cook! She uses it every day! Hence, the probability of her fingerprints being on the knife are very high indeed. (90%)

Plugging them into Bayes':

P(A|B) = P(B|A)P(A) / P(B) P(A|B) = (1)P(A) / (0.9) P(A|B) = 1.11P(A)

The presence of Sammy's fingerprints on the knife makes her 11% more likely to be guilty than any other suspect, which isn't much more. If there were only two suspects, the probability would increase from 50% to 52%, with three, 33% to 36%.

Which makes sense. If you could predict that Sammy's fingerprints were on the knife whether she was the murderer or not, the presence of fingerprints on the knife doesn't give much evidence.


Okay, you say. That makes sense, you say, but I already knew that since Sammy was the Chef, it's likely that her fingerprints would be on the knife even if she didn't commit the murder, you say. Bayes' theorem only helped give me a number to put on it. (You say)

Let us pull up the classical example of Bayes' theorm:

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

Do you have a guess? 70%? 80? Let's work it out with Bayes'.

A is event that the woman actually has breast cancer, and B is the event that the mammogram gives a positive result. Let's draw a diagram. Interactive, this time.

A∪B A B A∩B
P(A): 0.444
P(B): 0.444
P(A∩B): 0.222

But wait, you say. If A is having breast cancer, and B is having a positive test result, I don't have P(B)!

Ah, but you do. Let us enumerate the information we have.

We have P(A), the probability of breast cancer. (1%)

We have P(B|A), the probability of a positive result given breast cancer. This is called the true positive, because it has a positive test result with a positive outcome. (80%)

We also have P(B|A), the probability of a positive result given no breast cancer. This is a false positive, as there is a positive test results with a negative outcome. (9.6%)

We also have P(A∩B), which is P(B|A)P(A), which we both have. It works out to be 0.8%.

So how do we find P(B)? Well, the probability of a positive result is P(B|A)P(A) + P(B|A)P(A), as the probability of positive result given cancer, multiplied by the chance that you have cancer, plus the probability of a positive result given no cancer, multiplied by the chance you don't have cancer, is equal to the chance of a positive result.

Since P(A) = 1-P(A), P(A) = 0.99. Subbing everything in, P(B) = 0.103

Now that you have all the numbers for the interactive diagram, try subbing everything in, and estimate the probability of P(A|B).

Continue


So why is this important?

Well, as the previous example showed, human intuition is not always the best arbiter of what is true.

But vitally, in the field of the sciences, it is important to know that your hypothesis is actually true. In null-hypothesis significance testing, scientists use p-values, which is the probability that the result can be obtained by chance, if the hypothesis is false. Usually, the p-value must be under 0.05 (established by Sir Ronald Fischer in 1925 ), which means that there's only a 5% chance that the result is due to chance, in order for the result to be considered significant.

In my opinion, 5% is still too much; a 1/20 chance. But compounded with this problem is the sensitivity and specificity of your instruments.

Sensitivity is how accurately your instrument can detect what you are looking for. In the breast cancer example, the power of the test is 80%, as it detects 80% of all breast cancers.

Specificity is a measure of false positives. The false discovery rate the ratio of false positives to true positives, which, in the previous example, would be P(A|B), which is 92.2%

But nobody integrates sensitivity and specificity in into their p-value. There's too much trust in the accuracy of their tests; the power of the test in papers can range from 0.2 to 0.8, but the false discovery rate (the number of hypotheses that are false) is necessarily high. There are too many hypotheses to test, too many different hypotheses, and the fact that you have such a high false discovery rate, combined with a mediocre test power, necessarily means that your p-value is not a good indicator of significance.

You wouldn't believe you have breast cancer if you got a positive mammogram result, but often, in science, there's no way find the difference between a true and false positive. So, in science, a paper with statistical significance might not actually elucidate any real effect.

And therein lies the problem. An investigation shows that a p-value of 0.05 gives a false positive rate of 30%. Holding science to a standard of 1/3 is only slightly better than flipping a coin.


Bayesian analysis is important in calculating probabilities in everyday life, but is absolutely vital in the fields of science. For studies to actually be trusted, full analysis must be done in order to prevent the misrepresentation of the results.

Tagged with logic, programming
Posted on 2015-08-09 00:22:02

Comments (0)