There are two things one learns from Bayes’ Theorem that are the windows to everything else Bayesian reasoning can ever teach you. And there is a lot it can teach you besides these two things. But here I’m cutting to the chase of the two that are most essential: theories cannot be argued in isolation, and prior assumptions matter.
Theories Cannot Be Argued in Isolation
Any claim about facts is a theory. It is a theory as to how a particular body of evidence came about. Some such theories can be known to near certainty (the sky is mostly and usually blue; the earth revolves around the sun), others to varying degrees of certainty. Those we know to near certainty we usually just call facts (for more on this point see Proving History, index, “fact-theory distinction”). But still all empirical knowledge is ultimately theoretical. And this means if you are claiming something is the case (or is not the case), you cannot argue that claim is true by looking around for evidence that supports it.
That counter-intuitive statement is the first fundamental lesson of Bayes’ Theorem. The only way to argue for a claim is to look at the evidence and compare that claim to alternative claims. If some item of evidence E for claim C is just as likely on alternative claim D, then E has zero value as evidence for C (if D is the only alternative worth considering). In other words, E is then not evidence for C. No matter how much C predicts E, no matter how much E is expected on C, no matter how much E perfectly fits C.
Instead, the only factor that matters is the difference of two probabilities: how likely E is on C, and how likely E is on any other competing claim. If those two probabilities are the same (for all competing claims worth considering), then E has no value as evidence. If E is slightly more likely on C than on any other possible explanation, then E is weak evidence for C, i.e., it only slightly supports C. Only if E is a lot more likely on C than on every other possible explanation is E is strong evidence for C. Then, and only then, does E strongly support C. And that is, in fact, what it means to say E is good evidence for C. And in this respect, we can say all the same things if C refers instead to a subset of competing theories (C1, C2, C3). As then E can be evidence for all the theories in C (even good evidence) if all of them make E more likely than every other competing theory worth considering. But still, only then.
You cannot avoid this. You cannot logically claim E is evidence for C and simultaneously claim you are not or cannot be estimating those two probabilities. Because if you are not (or cannot), then you are lying. Because then you would be saying that you simultaneously do and do not know whether E is evidence for C. And that’s a self-contradiction. In reality, whenever you say E is evidence for C, you are literally saying E is more probable on C than on any other possible explanation of E. And since that is what you are really saying, you can only know it’s true if you checked–in other words, if you actually do know that E is more probable on C than on any other explanation of E you know. Which requires you to have confirmed the difference in those two probabilities. Which requires you to have considered alternative explanations. And that means explanations other than your own.
What it takes to do that then leads you to a cascade of further Bayesian arguments that ultimately rest in your basic experience (your collective observations, including observations of what others have communicated to you). Proving History lays that out (especially in chapter six), and I go deeper in Epistemological End Game (and everything I reference there).
But the fundamental point is that the only way to argue for any conclusion about the facts, is to seriously consider alternative explanations of the same facts (which means actually imagining they are true), and compare their ability to explain the evidence with the ability of your conclusion to do so.
And that means you have to do the work of finding out or thinking up all the best alternative explanations available (the worst ones you can ignore, but only because you examine them enough to confirm this, which often need be no more than collectively and fleetingly–you just need to confirm that they make E far too improbable, or that their prior probability is too small to be worth considering on the evidence we have: see Proving History, index, “vanishingly small probabilities” and “Ockham’s Razor”), and then taking them seriously and treating them informedly, and then comparing your explanation’s ability to predict the evidence, with their ability to predict the evidence. (See also index, “underdetermination,” for those who know what that is and are concerned about it.)
Prior Assumptions Matter
Every argument you have ever made, every conclusion you have ever reached, about any fact whatever, has relied on hidden assumptions of prior probability. You might not realize where those assumptions are hidden in your thinking. But they are always there. Literally, always. So if you don’t know that, then you don’t know whether any claim you believe is true. Because knowing that any claim C is true, requires knowing what prior probabilities you are assuming (and whether, of course, they are justified). This is important in at least two key respects.
First, it is important because it means you need to know what your assumptions about prior probability are, and on what they are based. Which you can only do when you understand what a prior probability is, and how to justify one. Which requires learning Bayesian reasoning. Second, it is important because it means any time you add assumptions to make a claim fit the evidence, this often logically reduces the prior probability, with the result that unless you change your assumed priors, your prior assumptions will then become false, and any conclusion you then reach will become unsound. Yet this requires knowing how much any given added assumption lowers the prior probability of your claim or belief. And if you don’t know that, then once again, you can’t ever know whether your claim or belief is true.
As to the first point, that we always require assumptions about prior probability, for every claim we make and every belief we hold, I demonstrate with deductive certainty in Proving History (pp. 106-14; and see “prior probability” in the index to get up to speed on what that is and how we calculate it, even when we don’t know we are). In short, Bayes’ Theorem entails all conclusions about the probability that any fact-claim is true are affected by prior probability. So for any claim C, the probability that C is true is affected by the prior probability that C is true (meaning, prior to considering any evidence E). And yet it can be shown that you are always assuming some prior probability for every claim you make (and thus every claim you believe to be true or are arguing is true). Therefore, the probability that C is true can only ever be known if you know what prior probability you are assuming for it. There is no way around this.
Because even if you think you aren’t assuming anything about the prior probability of any claim C, that in itself actually entails you are assuming something about the prior probability of C. For example, if you claim to be claiming nothing about the prior probability of C, you are de facto claiming the prior probability of C is 0.5 (or 50/50 or 50%). Because the only way you can be assuming the prior is not 0.5, is if you are assuming it is higher or lower than 0.5, which entails you are assuming the prior is high or low. Yet if you are not assuming the prior is high or low, you are assuming it is neither, and therefore you are assuming it’s 0.5. The same analysis follows for any value above or below 0.5, e.g., the mere fact of your claiming not to know if the prior is higher than 0.7, entails you are assuming it is not higher than 0.7. (See Proving History, pp. 83-85, for a full demonstration of what I mean here.)
You can’t escape this by saying you are assuming the prior probability can be anything between 0 and 1 (0% and 100%), because that entails “I am assuming nothing about the prior probability of C,” and that entails “I am assuming the prior probability of C can be anything.” And Bayes’ Theorem then entails the posterior probability of C can be anything–in other words, just simply, the probability of C can be anything. Which means you are saying you do not know, and cannot know, if C is true. At all. Ever. Until you settle on at least some range of possible priors for C that is anything other than “anything whatever.” And once you start doing that, you are making assumptions about the prior. And once you are doing that, you have to know what those assumptions are and why you are embracing those assumptions and not others. Otherwise, you cannot claim to know what you believe or claim is true. Because otherwise, you won’t, and couldn’t.
So the second thing Bayes’ Theorem teaches you is that you can’t hide from your assumptions anymore. You have to quantify them, and justify them. In one sense, the prior probability that any claim C is true equals the frequency with which C usually turns out to be true in similar circumstances. And if you build that frequency using “past cases” that rest solely on circular assumptions about what happened, and not on evidence that that in fact happened and not something else, then your conclusion (and thus all your claims and beliefs) will be fallaciously circular as well, resting on un-evidenced assumptions, and thus your conclusions are also nothing more than un-evidenced assumptions. And un-evidenced assumptions aren’t knowledge. This is using faith as a substitute for evidence. And that is a practice doomed to guarantee most of your beliefs are false. (See Sense and Goodness without God, index, “faith.”)
So in building a prior, you must only count what has typically happened in well-proven cases. All cases that haven’t been properly, reliably investigated cannot be counted, because you don’t yet know what those investigations would have turned up, except by reference to what typically happened among properly, reliably investigated cases. And cases that “can’t be” investigated automatically count as cases that haven’t been (so citing the fact that they can’t be investigated does not gain you anything).
This is why miracles have extremely low prior probabilities: not because miracles never happen (we don’t have to assume any such thing), but because in the set of all properly, reliably investigated miracle claims, exactly zero have turned up genuine–all have turned out to be something else entirely (as beautifully enumerated by Tim Minchin in his scathing song Thank You God for Fixing the Cataracts of Sam’s Mom). That does not mean the prior probability is zero, since that set does not include all miracle claims. But it does mean the prior probability that any other miracle claim is genuine conforms to Laplace’s Rule (on which see Proving History, index).
Even if you don’t have any investigated cases in the nearest category, you can apply comparable cases, belonging to broader generalizations. So, for example, even if some claim C is otherwise totally unique, the prior probability that someone is lying (or mistaken) about claim C can be based on the frequency with which you have observed people to lie in general, or the frequency among people like them claiming comparable things (if you have confirmed they lie about, or are mistaken about, similar things more often than people in general). (This is called choosing a “reference class” for determining a prior probability, on which see Proving History, index.) But the point is, the only way to justify an assumption about prior probability is to reference the frequency of comparable past cases that have been well established in human knowledge (and not just assumptions based on faith or prejudice or whim). Which requires getting out and learning stuff about the world.
As to the second point, if the evidence doesn’t match your claim so well (as theism always doesn’t, e.g. the Problem of Evil, or to continue the last analogy, the fact that when miracle claims are properly investigated they never turn out to be genuine), you can invent “excuses” to make your claim fit that evidence better. This means, adding assumptions to bolster your claim. But what Bayes’ Theorem teaches us is that every such assumption you add reduces the probability that your claim is true, rather than increasing it (as that excuse was supposed to do). Because adding assumptions to any claim C reduces the prior probability of C, and any reduction in the prior probability of C entails a reduction in the posterior probability of C, which is simply the probability of C. This is because the prior probability of several assumptions is always necessarily lower than the prior probability of any one of them alone, a basic fact of cumulative probability, which when overlooked produces the Conjunction Fallacy. (See Proving History, index, “gerrymandering,” for more on this point, and why trying to get a theory C to make the evidence E likely by adding tons of ad hoc assumptions to it doesn’t work.)
The amount by which adding an assumption A to claim C reduces the prior probability of C will vary according to many factors, and so to understand how doing that affects the probability of C, and thus affects your knowledge of whether C is true, requires, again, learning Bayesian reasoning–as well as just basic probability theory. For instance, you have to understand what a conjunction fallacy is and how to avoid it. The famous example (here quoting Wikipedia) is this:
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
- Linda is a bank teller.
- Linda is a bank teller and is active in the feminist movement.
People tend to answer that 2 is more probable than 1, owing to the assumption that her description increases the probability that she is a feminist, as indeed it does: Linda as-described is more likely to be a feminist than any randomly selected person from the general population. Indeed, her description also increases the probability that she is a bank teller (relative to the general population), even if by not as much. But the question is whether she is both a feminist and a bank teller. Since not all feminists are bank tellers (nor all bank tellers feminists), no matter what the probability is that Linda as-described is a feminist (let’s say we conclude it’s 90%), the probability that she is both is necessarily less than that. Even if 99% of all women (even feminists) are bank tellers, the probability that Linda would be both is 90% x 99% = 0.99 x 0.90 = 0.891, and 89.1% is less than 90% and 99%. So the probability of 1 would be 99%. But the probability of 2 would be 89.1%. So 1 is more probable, not 2. Despite Linda’s description.
And that’s just one example of a common mistake people make in probability theory. And since all knowledge and beliefs are probabilistic, your mistakes in probability theory guarantee you will have false beliefs. Unless you stop pretending you aren’t dependent on hidden Bayesian reasoning to justify your beliefs. Because you are. And knowing that starts you on the path to correcting your faulty reasoning about the facts, and thus correcting your beliefs.
For more of my introductions to and explanations of Bayes’ Theorem and Bayesian reasoning, see my video Bayes’ Theorem: Lust for Glory, my Bayesian Calculator resource page, and any of my other articles on Bayesianism.