If You Learn Nothing Else about Bayes’ Theorem, Let It Be This

26 April 2014

There are two things one learns from Bayes’ Theorem that are the windows to everything else Bayesian reasoning can ever teach you. And there is a lot it can teach you besides these two things. But here I’m cutting to the chase of the two that are most essential: theories cannot be argued in isolation, and prior assumptions matter.

Theories Cannot Be Argued in Isolation

Any claim about facts is a theory. It is a theory as to how a particular body of evidence came about. Some such theories can be known to near certainty (the sky is mostly and usually blue; the earth revolves around the sun), others to varying degrees of certainty. Those we know to near certainty we usually just call facts (for more on this point see Proving History, index, “fact-theory distinction”). But still all empirical knowledge is ultimately theoretical. And this means if you are claiming something is the case (or is not the case), you cannot argue that claim is true by looking around for evidence that supports it.

That counter-intuitive statement is the first fundamental lesson of Bayes’ Theorem. The only way to argue for a claim is to look at the evidence and compare that claim to alternative claims. If some item of evidence E for claim C is just as likely on alternative claim D, then E has zero value as evidence for C (if D is the only alternative worth considering). In other words, E is then not evidence for C. No matter how much C predicts E, no matter how much E is expected on C, no matter how much E perfectly fits C.

Instead, the only factor that matters is the difference of two probabilities: how likely E is on C, and how likely E is on any other competing claim. If those two probabilities are the same (for all competing claims worth considering), then E has no value as evidence. If E is slightly more likely on C than on any other possible explanation, then E is weak evidence for C, i.e., it only slightly supports C. Only if E is a lot more likely on C than on every other possible explanation is E is strong evidence for C. Then, and only then, does E strongly support C. And that is, in fact, what it means to say E is good evidence for C. And in this respect, we can say all the same things if C refers instead to a subset of competing theories (C1, C2, C3). As then E can be evidence for all the theories in C (even good evidence) if all of them make E more likely than every other competing theory worth considering. But still, only then.

You cannot avoid this. You cannot logically claim E is evidence for C and simultaneously claim you are not or cannot be estimating those two probabilities. Because if you are not (or cannot), then you are lying. Because then you would be saying that you simultaneously do and do not know whether E is evidence for C. And that’s a self-contradiction. In reality, whenever you say E is evidence for C, you are literally saying E is more probable on C than on any other possible explanation of E. And since that is what you are really saying, you can only know it’s true if you checked–in other words, if you actually do know that E is more probable on C than on any other explanation of E you know. Which requires you to have confirmed the difference in those two probabilities. Which requires you to have considered alternative explanations. And that means explanations other than your own.

What it takes to do that then leads you to a cascade of further Bayesian arguments that ultimately rest in your basic experience (your collective observations, including observations of what others have communicated to you). Proving History lays that out (especially in chapter six), and I go deeper in Epistemological End Game (and everything I reference there).

But the fundamental point is that the only way to argue for any conclusion about the facts, is to seriously consider alternative explanations of the same facts (which means actually imagining they are true), and compare their ability to explain the evidence with the ability of your conclusion to do so. For more on that point see Advice on Probabilistic Reasoning.

And that means you have to do the work of finding out or thinking up all the best alternative explanations available (the worst ones you can ignore, but only because you examine them enough to confirm this, which often need be no more than collectively and fleetingly–you just need to confirm that they make E far too improbable, or that their prior probability is too small to be worth considering on the evidence we have: see Proving History, index, “vanishingly small probabilities” and “Ockham’s Razor”), and then taking them seriously and treating them informedly, and then comparing your explanation’s ability to predict the evidence, with their ability to predict the evidence. (See also index, “underdetermination,” for those who know what that is and are concerned about it.)

Prior Assumptions Matter

Every argument you have ever made, every conclusion you have ever reached, about any fact whatever, has relied on hidden assumptions of prior probability. You might not realize where those assumptions are hidden in your thinking. But they are always there. Literally, always. So if you don’t know that, then you don’t know whether any claim you believe is true. Because knowing that any claim C is true, requires knowing what prior probabilities you are assuming (and whether, of course, they are justified). This is important in at least two key respects.

First, it is important because it means you need to know what your assumptions about prior probability are, and on what they are based. Which you can only do when you understand what a prior probability is, and how to justify one. Which requires learning Bayesian reasoning. Second, it is important because it means any time you add assumptions to make a claim fit the evidence, this often logically reduces the prior probability, with the result that unless you change your assumed priors, your prior assumptions will then become false, and any conclusion you then reach will become unsound. Yet this requires knowing how much any given added assumption lowers the prior probability of your claim or belief. And if you don’t know that, then once again, you can’t ever know whether your claim or belief is true.

As to the first point, that we always require assumptions about prior probability, for every claim we make and every belief we hold, I demonstrate with deductive certainty in Proving History (pp. 106-14; and see “prior probability” in the index to get up to speed on what that is and how we calculate it, even when we don’t know we are). In short, Bayes’ Theorem entails all conclusions about the probability that any fact-claim is true are affected by prior probability. So for any claim C, the probability that C is true is affected by the prior probability that C is true (meaning, prior to considering any evidence E). And yet it can be shown that you are always assuming some prior probability for every claim you make (and thus every claim you believe to be true or are arguing is true). Therefore, the probability that C is true can only ever be known if you know what prior probability you are assuming for it. There is no way around this.

Because even if you think you aren’t assuming anything about the prior probability of any claim C, that in itself actually entails you are assuming something about the prior probability of C. For example, if you claim to be claiming nothing about the prior probability of C, you are de facto claiming the prior probability of C is 0.5 (or 50/50 or 50%). Because the only way you can be assuming the prior is not 0.5, is if you are assuming it is higher or lower than 0.5, which entails you are assuming the prior is high or low. Yet if you are not assuming the prior is high or low, you are assuming it is neither, and therefore you are assuming it’s 0.5. The same analysis follows for any value above or below 0.5, e.g., the mere fact of your claiming not to know if the prior is higher than 0.7, entails you are assuming it is not higher than 0.7. (See Proving History, pp. 83-85, for a full demonstration of what I mean here.)

You can’t escape this by saying you are assuming the prior probability can be anything between 0 and 1 (0% and 100%), because that entails “I am assuming nothing about the prior probability of C,” and that entails “I am assuming the prior probability of C can be anything.” And Bayes’ Theorem then entails the posterior probability of C can be anything–in other words, just simply, the probability of C can be anything. Which means you are saying you do not know, and cannot know, if C is true. At all. Ever. Until you settle on at least some range of possible priors for C that is anything other than “anything whatever.” And once you start doing that, you are making assumptions about the prior. And once you are doing that, you have to know what those assumptions are and why you are embracing those assumptions and not others. Otherwise, you cannot claim to know what you believe or claim is true. Because otherwise, you won’t, and couldn’t.

So the second thing Bayes’ Theorem teaches you is that you can’t hide from your assumptions anymore. You have to quantify them, and justify them. In one sense, the prior probability that any claim C is true equals the frequency with which C usually turns out to be true in similar circumstances. And if you build that frequency using “past cases” that rest solely on circular assumptions about what happened, and not on evidence that that in fact happened and not something else, then your conclusion (and thus all your claims and beliefs) will be fallaciously circular as well, resting on un-evidenced assumptions, and thus your conclusions are also nothing more than un-evidenced assumptions. And un-evidenced assumptions aren’t knowledge. This is using faith as a substitute for evidence. And that is a practice doomed to guarantee most of your beliefs are false. (See Sense and Goodness without God, index, “faith.”)

So in building a prior, you must only count what has typically happened in well-proven cases. All cases that haven’t been properly, reliably investigated cannot be counted, because you don’t yet know what those investigations would have turned up, except by reference to what typically happened among properly, reliably investigated cases. And cases that “can’t be” investigated automatically count as cases that haven’t been (so citing the fact that they can’t be investigated does not gain you anything).

This is why miracles have extremely low prior probabilities: not because miracles never happen (we don’t have to assume any such thing), but because in the set of all properly, reliably investigated miracle claims, exactly zero have turned up genuine–all have turned out to be something else entirely (as beautifully enumerated by Tim Minchin in his scathing song Thank You God for Fixing the Cataracts of Sam’s Mom). That does not mean the prior probability is zero, since that set does not include all miracle claims. But it does mean the prior probability that any other miracle claim is genuine conforms to Laplace’s Rule (on which see Proving History, index).

Even if you don’t have any investigated cases in the nearest category, you can apply comparable cases, belonging to broader generalizations. So, for example, even if some claim C is otherwise totally unique, the prior probability that someone is lying (or mistaken) about claim C can be based on the frequency with which you have observed people to lie in general, or the frequency among people like them claiming comparable things (if you have confirmed they lie about, or are mistaken about, similar things more often than people in general). (This is called choosing a “reference class” for determining a prior probability, on which see Proving History, index.) But the point is, the only way to justify an assumption about prior probability is to reference the frequency of comparable past cases that have been well established in human knowledge (and not just assumptions based on faith or prejudice or whim). Which requires getting out and learning stuff about the world.

As to the second point, if the evidence doesn’t match your claim so well (as theism always doesn’t, e.g. the Problem of Evil, or to continue the last analogy, the fact that when miracle claims are properly investigated they never turn out to be genuine), you can invent “excuses” to make your claim fit that evidence better. This means, adding assumptions to bolster your claim. But what Bayes’ Theorem teaches us is that every such assumption you add reduces the probability that your claim is true, rather than increasing it (as that excuse was supposed to do). Because adding assumptions to any claim C reduces the prior probability of C, and any reduction in the prior probability of C entails a reduction in the posterior probability of C, which is simply the probability of C. This is because the prior probability of several assumptions is always necessarily lower than the prior probability of any one of them alone, a basic fact of cumulative probability, which when overlooked produces the Conjunction Fallacy. (See Proving History, index, “gerrymandering,” for more on this point, and why trying to get a theory C to make the evidence E likely by adding tons of ad hoc assumptions to it doesn’t work.)

The amount by which adding an assumption A to claim C reduces the prior probability of C will vary according to many factors, and so to understand how doing that affects the probability of C, and thus affects your knowledge of whether C is true, requires, again, learning Bayesian reasoning–as well as just basic probability theory. For instance, you have to understand what a conjunction fallacy is and how to avoid it. The famous example (here quoting Wikipedia) is this:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

Linda is a bank teller.

Linda is a bank teller and is active in the feminist movement.

People tend to answer that 2 is more probable than 1, owing to the assumption that her description increases the probability that she is a feminist, as indeed it does: Linda as-described is more likely to be a feminist than any randomly selected person from the general population. Indeed, her description also increases the probability that she is a bank teller (relative to the general population), even if by not as much. But the question is whether she is both a feminist and a bank teller. Since not all feminists are bank tellers (nor all bank tellers feminists), no matter what the probability is that Linda as-described is a feminist (let’s say we conclude it’s 90%), the probability that she is both is necessarily less than that. Even if 99% of all women (even feminists) are bank tellers, the probability that Linda would be both is 90% x 99% = 0.99 x 0.90 = 0.891, and 89.1% is less than 90% and 99%. So the probability of 1 would be 99%. But the probability of 2 would be 89.1%. So 1 is more probable, not 2. Despite Linda’s description.

And that’s just one example of a common mistake people make in probability theory. And since all knowledge and beliefs are probabilistic, your mistakes in probability theory guarantee you will have false beliefs. Unless you stop pretending you aren’t dependent on hidden Bayesian reasoning to justify your beliefs. Because you are. And knowing that starts you on the path to correcting your faulty reasoning about the facts, and thus correcting your beliefs.

For more of my introductions to and explanations of Bayes’ Theorem and Bayesian reasoning, see my video Bayes’ Theorem: Lust for Glory, my Bayesian Calculator resource page, and any of my other articles on Bayesianism.

Share this:

21 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

richardcollins

11 years ago

If there really was a god we would all know Bayes theory cold and apply it “religiously”. Thanks for this illuminating essay.

Ben Wright

11 years ago

In reality, whenever you say E is evidence for C, you are literally saying E is more probable on C than on any other possible explanation of E.

I think this formulation is a little misleading – I’d define evidence ‘for’ something as indicating that the posterior probability has increased – the amount it increases by determining the strength of the evidence. So with hypotheses F, G, H, … Z that are equiprobable a priori, if evidence E all but excludes H-Z, leaving only F and G equally probable, then I would call E evidence for both F and G. By your definition, it would not count as evidence for either, as the change in probability is not higher for one than the other.

In this example you still can’t discern F from G, but you have definitely learned something. You would justified in holding a higher confidence in the accuracy of F or G as a result.

It’s a small quibble, really, based on what we choose to define ‘evidence’ as meaning.

In any case, up the Bayesians!

Richard Carrier

Reply to Ben Wright

11 years ago

I think this formulation is a little misleading – I’d define evidence ‘for’ something as indicating that the posterior probability has increased – the amount it increases by determining the strength of the evidence.

That is what I am saying. I am just saying it one level down in analysis because this is for people new to all this.

But you do rightly explain a point that I also should include, about demarcating different kinds of competing claim. I needed to include a paragraph on that (for more reasons than even you note), and will do so shortly.

Reginald Selkirk

11 years ago

Off-topic: Have you heard of this yet?
Who Died and Made Him Pope?

by Candida Moss
…
This is precisely what a new book by Fordham University theology professor George Demacopoulos proposes. In The Invention of Peter, Demacopoulos argues that Peter never visited the city of Rome, never founded a church there, and was not the first Pope. In fact, the very idea of Peter as the Supreme Pontiff and leader of a worldwide church is a much later idea that took its rise in the ecclesial politics of the fifth century.
…

I would love to read your comments if you feel this falls within your purveyance.

Richard Carrier

Reply to Reginald Selkirk

11 years ago

That’s actually not news. Quite a lot of mainstream scholars already agree with Demacopoulos and Moss about this. Indeed, I would venture to say that, outside of fundamentalists, this is the broadest consensus view.

But don’t mistake them as saying Peter didn’t exist or wasn’t the first apostle and leader of the original Jerusalem synod (or equivalent).

abcxyz

Reply to Reginald Selkirk

11 years ago

Richard, what are the strongest arguments against Paul mythicism?

Richard Carrier

Reply to abcxyz

11 years ago

I don’t have time to waste writing about that fringe view. I have satisfied myself that it is based on no plausible argumentation at all and therefore requires no further attention. Although it’s possible, there is no evidence to support the conclusion that it is probable. Even what is pretended to be evidence (e.g. the long length of Paul’s letters) is based on poor understanding of that evidence (e.g. we’ve long known that there is abundant evidence Paul’s letters are pastiches of several prior letters that were shorter…so not only can we not show the original letters were abnormally long, one does not forge a letter by stitching together fragments of other letters; this is therefore evidence that there was an authentic body of letters someone later tried to create their own collection from, and if the letters aren’t forged, there goes the claim that Paul didn’t exist to write them). Likewise falls every other argument for the non-existence of Paul. I’ve never seen one that was both logically valid and factually correct.

abcxyz

Reply to Reginald Selkirk

11 years ago

Okay, thanks.

Richard Carrier

11 years ago

Revisions to the Above Essay:

“If some item of evidence E for claim C is just as likely on alternative claim D, then E has zero value as evidence for C.”

Has now become:

“If some item of evidence E for claim C is just as likely on alternative claim D, then E has zero value as evidence for C (if D is the only alternative worth considering)”

And:

“If those two probabilities are the same, then E has no value as evidence.”

Has become:

“If those two probabilities are the same (for all competing claims worth considering), then E has no value as evidence.”

And that paragraph now ends with:

“And in this respect, we can say all the same things if C refers instead to a subset of competing theories (C1, C2, C3). As then E can be evidence for all the theories in C (even good evidence) if all of them make E more likely than every other competing theory worth considering. But still, only then.”

And the parenthesis:

“(the worst ones you can ignore, but only because you examine them enough to confirm this, which often need be no more than collectively and fleetingly: see Proving History, index, “vanishingly small probabilities” and “Ockham’s Razor”)”

Now reads:

“(the worst ones you can ignore, but only because you examine them enough to confirm this, which often need be no more than collectively and fleetingly–you just need to confirm that they make E far too improbable, or that their prior probability is too small to be worth considering on the evidence we have: see Proving History, index, “vanishingly small probabilities” and “Ockham’s Razor”)”

And where I mention gerrymandering, I have added:

“…and why trying to get a theory C to make the evidence E likely by adding tons of ad hoc assumptions to it doesn’t work.”

Brian O

11 years ago

This is a better explanation of the ‘Linda Problem’ than you get elsewhere, but I still think it’s fairly useless as a tool for human affairs.

Firstly, it is a “leading” question, which naturally pulls the respondent in a particular way, bringing the objectivity of the test into doubt.

Secondly, as you break Sets into Subsets into Subsets-of-Subsets (this time regarding “facts” about a particular person), the size of those Sets and Subsets naturally become smaller, which really isn’t particularly insightful, even if it is statistically useful. I think most reasonably intelligent people could intuit that the number of firefighters is greater than the number of firefighters with red hair and green eyes, without ever knowing about Bayes or Daniel Kahneman.

Richard Carrier

Reply to Brian O

11 years ago

I’ve seen people make that mistake many times. So evidently, it is a thing. This example is only drawn up to make clear the mistake.

Ceres

11 years ago

Hey Richard Carrier. What do you think of that book “Shattering the Christ Myth” . I hear its probably the most detailed treatment of Doherty , Price and the other mythicists out there.
Also , who makes the bast case for historicity?

Richard Carrier

Reply to Ceres

11 years ago

Oh that’s terrible. Disorganized collection of straw man arguments, half of which aren’t even relevant to the actual theories they claim to be critiquing. I found it wholly useless.

So far no one makes the best case for historicity. Or even a good one. Which is not because there couldn’t at least be a good one. So I can’t explain this failure.

Ceres

11 years ago

That sucks. I was looking forward to reading something on historicity.
DId you ever do a review of Shattering the Christ myth? You should make a post

Richard Carrier

Reply to Ceres

11 years ago

It’s too awful to even be worth the bother. Any mythicist book worth reading already refutes it. Which is just about the most embarrassing level of failure a critique can achieve–when the book it claims to be critiquing doesn’t even need to be altered or expanded to already decisively refute that critique. My book out this June will just crush it further, without even having to address it.

J. Quinton

11 years ago

One important thing I learned about Bayes Theorem, in the context of theories not being argued in isolation, is that absence of evidence is indeed evidence of absence. Just like ones favored explanation and any/all alternatives are exhaustive and thus are compliments (i.e. adding up to 100%) probabilities for conditionals and the probabilities of the evidence are also exhaustive.

For example, the conditional probability of Pr(E | H) has the compliment of Pr(~E | H) which has to add up to 100% (if they don’t then you end up with a dutch book argument). This knowledge can in turn be applied to use BT to find out exactly how much the absence of evidence affects your hypothesis with a mirror image-like BT: Pr(H | ~E) = Pr(~E | H) * Pr(H) / Pr(~E). This makes it clear that the refrain “absence of evidence isn’t evidence of absence” is itself a logical fallacy, proven false by BT.

Richard Carrier

Reply to J. Quinton

11 years ago

absence of evidence is indeed evidence of absence

Sometimes yes, sometimes no. I discuss the mathematics of this in Proving History, pp. 117-19.

And I also explain how useful it is to think about P(~e|h), pp. 255-56.

But sometimes a silence is equally expected on h and ~h. And then absence of evidence is not evidence of absence. I discuss examples, including those where there is a differential, but too small to matter, e.g., pp. 219-24.

GTR

11 years ago

Sometimes people know factors, but fail to produce a final equation that summarizes them by putting them into the equation, which kind of leaves the whole issue they try to convey unfinishet. Example video that has factors, but lacks final equation:

Paul Jarc

11 years ago

So the probability of 1 would be 99%.

99% is the probability that a randomly chosen feminist is a bank teller, but 1 is the claim that Linda (a likely but not certain feminist) is a bank teller. You haven’t actually nailed down the probability of statement 1.

Richard Carrier

Reply to Paul Jarc

11 years ago

You’re right, it needs to be “women (even feminists).” I have emended the article.

gdwljsdgfarandom[a,z]random[a,z]random[a,z]

11 years ago

michael kors hamilton you got that right, I considered that hassles getting. extremely, tend not to basically,just exactly cleanse overdue those playlists much more try to retain the internal parts new really. possibly this could implie planning to get the tiniest bit of a better cut of that bothersome ear hair or perhaps taking the time to clear out your head (avoid cotton swabs, good results,nevertheless when you see baby gel, nutrient petrol, Or profitable ear wax ridding comes that comes in any pharmacy). michael kors miranda

michael kors jet set steve typically knight is repaired, so your man ordinarily should thrive long lasting together with courageous, and consequently dedicated to Orthodoxy. At age of three the young child could already load a horse and also by five he was really a convinced participant. biological father given also inform their whole sons ale rough photographing, Adroitness additionally coordination from the young age.seeing one particular Cossack’s military tips, The ruskies state triedto regulating them and cause them to hand out the Tsar. michael kors jet set

michael kors miranda It is beginning to get demoralizing in Mari it goes without saying efeated?Riko whenever referring to a greater single as there are still Reiko who will be spreading rumors going her from the internet. It as anyone no anticipation. And this will likely be his or her’s first and carry on for CD. doctor proper footwear are able to improve normal daily functioningUse along with image and as well,as well as the statistical research allows you to tag spots for increase focuses educating to extend performance: how a fabric carries out utilized eg water-proof or robust. ] purposes and supports the planning attached to prosperous training program outs: an organized process that may takes technological thoughts this will aid features, credential, video games possibility as well as health. ].the particular Pany kind of arlipicksment pm, A highly regarded or a Chancellwith the help of case Ministers. this kind of national may have contributed guilt from the people and also also the Parliament. In Zimbabwe the is joint involving the ceo and the pm. michael kors hamilton