I often encounter people who confuse “Bayesian statistics” with “Bayesian epistemology” or even just “Bayesian reasoning.” I’ll get critics writing me who will assert things like “Bayesian statistics can’t be used on historical data,” or “you can’t do philosophy with Bayesian statistics,” which are both false (there are rare occasions when indeed you can) and not answering anything I ever said. Because “Bayesian statistics” is not “Bayesian reasoning,” much less “Bayesian epistemology.” I have only been an advocate for the latter (though I also agree those scientists advocating the former are right). Yet many people will still take issues they have with Bayesian statistics as evidence against Bayesian reasoning or Bayesian epistemology. This needs to stop. Here is a curative.

My Actual Position

That all historical reasoning is ultimately Bayesian was twice independently discovered and demonstrated under peer review by separate scholars: myself (in Proving History) and Aviezer Tucker (in Our Knowledge of the Past). Anyone who objects to our discovery on the grounds that history doesn’t (or, presumably, couldn’t) employ Bayesian statistics, obviously hasn’t read either of our books, or didn’t pay attention to what those books said, or else doesn’t know what the word statistics means.

I’ve noted before that Kamil Gregor may be one such person. For example, he apparently expects that if you don’t have thousands of data points, you can’t apply Bayesian reasoning to reach any conclusion from the data. False. But the mistake he is making is that he is talking about statistics, not probability theory, and then forgetting he’s doing that, and thus fallaciously equivocating between the two. Bayesian reasoning is a logic, a way of carefully defining and vetting any reasoning you engage in. Statistics is a tool, generally a very complex tool, for extracting information from (usually) large sets of data. These are not the same thing.

Most of the time when people reject Bayesian epistemology, it’s because they don’t understand it. I demonstrated this recently in my survey of examples in Hypothesis: Only Those Who Don’t Really Understand Bayesianism Are Against It. But one form of misunderstanding I don’t cover there is this one: confusing logic and epistemology with statistics. And this error is committed even by people who understand Bayes’ Theorem very well, like Kamil Gregor (see Kamil Gregor on the Historicity of Jesus). But even people who don’t really understand Bayes’ Theorem make this mistake. So the following is for anyone making this mistake.

“Statistics” vs. “Epistemology”

Bayesian statistics is a subset of statistics, and statistics is the science of building complex mathematical models as tools to extract information from (usually) large sets of data. That’s what it is by definition. As such it does not exclude small data sets necessarily; it just tends to because that’s not what it’s for. Only relatively weak inferences can be made from small data sets, and one doesn’t need complex math to show that, or even to show what inferences can indeed be made. Your average professional gambler doesn’t have to write a twenty page maths paper to smartly compute their odds in any given scenario. Nor to decide which is the best car to buy or whether to date a certain person or who to vote for in the next election. Nor even to decide what worldview to believe in, what to believe about themselves or others, and every other thing we need philosophy for. Thus statistics was not built for those tasks. It was specifically built as a high-powered tool to get interesting, and hopefully actionable, information out of large data sets, and in particular sets that meet certain useful criteria (like randomized selection or near completeness).

As such, that tool is very limited. It’s only applicable in those rare cases where those kinds of data sets exist; which are cases rarely relevant to ordinary human life, or if relevant, only so by many stages removed from everyday reasoning. Our government needs such tools, and has such data, and makes better decisions affecting our lives when it uses them. But we don’t need to use them ourselves to enjoy that benefit; we should, rather, lobby for the government to hire well-qualified, full-time experts to do it, because generally, only such people can. Just as we must engage division of labor in every other aspect of civilization. We need dedicated experts to write our histories, discover our physics, cure our ailments, build our roads, police our streets, adjudicate our laws, and so on, because none of us can be an expert at all these things, nor have the time to do all those things.

Bayesian epistemology, and Bayesian reasoning more broadly, is not that. It is, rather, a fairly simple logical model of all correct human reasoning. It’s about explaining when and why human inferences are correct or incorrect about anything. It answers the question, “What is it that I am doing when I come to some belief or other when given certain information?” As well as the questions, “Why, or how, or when is that belief I come to justified or warranted?” As such it is all encompassing. Even scientists using Bayesian statistics, are nevertheless also using Bayesian epistemology—whether they know it or not. Just like everyone else. Because after they’ve done all the statistics, they still have to reach a belief, a conclusion, and infer something about the world and how likely they are to be right. And that step is always fundamentally Bayesian—and non-statistical. At that stage, the results of any statistical models they built just become more data that become more evidence or background knowledge in further inferences the scientist makes about what they are studying. “What we found” is given by statistics. “What that means” is given by epistemology.

Ironically even so-called frequentists are doing this, every time they get a result of rejecting the null hypothesis, and then conclude their hypothesis is true—a conclusion that in no logical way follows from their statistical result…except by covert Bayesian reasoning (see “Why ‘Frequentism’ Is Defective as a Methodology,” the last section of my Hypothesis article). By their own admission (when pressed), all that their frequentist model will have proved was that the probability, that some signal they found in the noise was merely a product of random accident, was low enough to ignore. Though sometimes not even so much as that, honestly, as they tend to accept disturbingly high probabilities of error on that point, mistakenly thinking “95% certainty” is good, for example, when in fact it’s quite dismal, producing false conclusions to fairly high frequencies (from 1 in 20 to even 1 in 3, as I noted earlier this month in my article on Vegan Propaganda). And really, I suspect scientists only accept these high rates of error because they mostly “find” extremely weak effects, and the only way to make weak effects look strong, is to lower your confidence that you are detecting anything at all—and “publish or perish.” No budget? No problem, just run a low powered study that can’t actually reliably tell us anything about the world. But regardless, even when they have higher confidence levels and large effect sizes, still all they have accomplished is proving “it’s very unlikely this is random.” What they do after that, is Bayesian.

For example, frequentists will simply “assume” the prior probability of fraud or error in the production of their result is low enough to ignore; and then simply “assume” other hypotheses don’t exist that explain the same statistical result amply as well or even better. How they convince themselves those assumption are valid is all secretly Bayesian; as any credible defense they made of those assumptions would expose. When scientists choose between models, or which models to test against each other, by hunch or gut, they are engaging a Bayesian epistemology. For any conclusion scientists reach about the nature of the world, ultimately, somewhere, no matter how much statistics were involved, some step of reasoning on which the final assertion depends will have been non-statistically Bayesian, whether they realize that or not.

It’s Bayes All the Way Down

Indeed, even the human brain itself has evolved into a primitive Bayesian. Already at earlier stages this was happening (e.g. animal visual processing is significantly Bayesian). Our innate, intuitive belief-forming processes, and our perception—the way our brains decide to model the world we live in from sensory data (and even, model ourselves, who we are and what we are thinking and feeling, from internal data)—follows a crude Bayesian formula encoded in our neural systems. Probably because natural selection has been pruning us toward what works. And really, Bayesian epistemology is it. Any other epistemology you try to defend, either can be shown inadequate, or ends up Bayesian the moment you check under the hood.

You can peruse several recent summaries and key examples:

Florent Meyniel, “A Bayesian Approach to the Brain,” Dana Foundation (2016)
Manuel Brenner, “The Bayesian Brain Hypothesis,” M (15 August 2019)
Wikipedia’s article on Bayesian approaches to brain function
Konrad Paul Kording, “Bayesian Statistics: Relevant for the Brain?,” Current Opinion in Neurobiology 25 (April 2014)
Will Penny, “Bayesian Models in Neuroscience,” International Encyclopedia of the Social and Behavioral Sciences, 2nd ed. (2015)
Chih-Chung Ting et al., “Neural Mechanisms for Integrating Prior Knowledge and Likelihood in Value-Based Probabilistic Inference,” Journal of Neuroscience 34.4 (January 2015)
Meyniel, Sigman, and Mainen, “Confidence as Bayesian Probability: From Neural Origins to Behavior,” Neuron 88.1 (October 2015)
Dold et al., “Stochasticity from Function,” Neural Networks 119C (2019)
Adam Sanborn & Nick Chater, “Bayesian Brains without Probabilities,” Trends in Cognitive Sciences (26 October 2016)

A critique at Scientific American published in 2016 (“Are Brains Bayesian?” by John Horgan) only relates to a critic’s paper published way back in 2012; everything in which has since been refuted by subsequent science further establishing the Bayesian brain hypothesis.

That critic, Jeffrey Bowers, offered really only three objections. His first is that Bayesians claim “the brain employs highly efficient, even ‘optimal’ methods for carrying out cognitive tasks,” when natural selection typically only gets things ‘good enough’ rather than optimal. But he is confusing two different senses of optimal. Bayesians do not say the brain is optimally Bayesian in the sense of perfectly implementing Bayesian reasoning—or indeed, as Bowers seems incorrectly to assume, Bayesian “statistics” (here I think is another example of a scientist confusing the two), which the brain doesn’t do at all. Rather, their actual claim is that by using roughly Bayesian mechanisms the brain approaches the optimal in its consumption of energy and neural resources to achieve tasks, which has been proven (see the summaries and studies linked above). And that is indeed exactly the kind of optimization natural selection trends organisms toward. See, in particular, Sanborn & Chater (above).

In fact, that brains evolved only crude Bayesian processing both explains our wide susceptibility to cognitive errors and our brains’ efficiency at reaching conclusions at all: our brains sacrificed things like “consider all hypotheses” (which would be inordinately time consuming) for more efficient heuristics (like “availability or familiarity implies a high prior”) that are actually erroneous, but not wholly. Hence the “good enough” function of the brain is still ultimately Bayesian, insofar as that’s what the brain does with the data it collects.

Bowers’ second argument, that “other information-processing models, such as neural networks, can replicate the results of Bayesian models,” ignores the fact that that’s moot: if the brain is implementing any neural network in a way that replicates Bayesian processing of priors and likelihoods, that’s precisely what we mean by the brain being Bayesian. Only if the brain’s neural network processing were breaking those rules (e.g. disregarding priors, not employing empirical priors, not updating priors in rough accordance with likelihood ratios) would the brain not be Bayesian. This allows a lot of room for flawed Bayesian processes to still be recognizably Bayesian; but it didn’t have to turn out this way. We could have found the brain reaching decisions and constructing perceptual models using entirely different processes. We didn’t.

Which finally gets us to Bowers’ last argument, that “neuroscience, contrary to the claims of Bayesians, has provided little or no support for the idea that neurons carry out Bayesian-style processing of information,” which, it turns out, was simply bad prophecy. Study after study has since been published producing exactly that support. This is what the brain does. Which means, it’s what you are doing. All of you. Civilian and scientist alike. Your innate inferences, your gut feelings, your feelings of confidence, your perceptions—all Bayesian. Not “Bayesian statistics,” not “flawlessly Bayesian,” but Bayesian all the same. Understanding this is crucial to understanding the distinction between “Bayesian statistics” and “Bayesian reasoning.” And thus crucial to improving how you think.

Understanding Bayesian Epistemology

As such, all that Bayesian epistemology requires you to understand are a few concepts that require little more than sixth grade math. This includes basic probability theory (total probability, conditional probability, sums, multiplication, and so on) and how Bayes’ Theorem taps that to model sound reasoning. This includes understanding and accepting that:

Our estimate of posterior odds corresponds to what we feel as confidence that a thing is true, which at high confidence we tend to call a belief;
Posterior odds must equal prior odds times the likelihood ratio;
All these odds and likelihoods are conditional on background knowledge;
The likelihood ratio is the ratio between the likelihood of the evidence on competing explanations of that evidence;
And “evidence” and “background knowledge” must together include all information available to us when doing the math, without remainder.

Each of these facts unpacks a little. But none of them require understanding statistics as such. Or ever even using it. Statistics is useful, when it can be employed competently; but it isn’t necessary, and most of the time, isn’t applicable anyway for want of the right kind or quantity of data. Part of the reason for this is that estimation is more efficient in 99% of human life (see Not Exactly: In Praise of Vagueness or even the article on Estimation at MathIsFun). Asking for exactness is neither necessary nor even wise, as it is incredibly exhaustive of time and resources, all to reach no difference in general conclusion—such as “whether I should believe x” for any x. Conclusions or decisions reached a fortiori to within a single percentile are not going to change if you try to reach the same conclusion or decision with a result to six decimal places. “Good enough” is all we need in almost everything. The exceptions, we call science.

I survey how Bayesian epistemology works in practice in the subject of history in Proving History, through the same method of estimation and argument a fortiori, which as such never require “doing statistics,” but always require using Bayes’ Theorem. But one can easily adapt that to any other knowledge endeavor, not just every subject of both life and philosophy, but even science. Since most of what scientists do is covertly intuitive rather than rigorous. They might generate tons of rigorous “results.” But what they end up deciding or doing with those results, always comes down to gut feelings about what must surely be the case, “if this be the result.” The only way to logically justify those gut feelings is with Bayes’ Theorem. Even if you never explicitly work this out, or ever do only on the back of a napkin.

All of which gets me to discussing a recent paper published by actual Bayesians critiquing some flaws or conundrums in Bayesian statistics. That is, at least as practiced—or so I presume; the paper neglects to include any actual or real world examples of any of the things it complains about, though I am assured such things exist, so I’ll just assume they do. (For the record, not giving actual, particular examples justifying your generalizations is bad philosophy. But scientists tend to be bad at philosophy so we’ll forgive them that here.) More importantly, everything they call out is a defect in Bayesian statistics: the building of complex mathematical models and tools for extracting information from (presumably) large and appropriately sampled data sets. One could mistakenly read them as proposing these as defects in Bayesian epistemology; but in fact, none of the problems they call attention to apply to or affect Bayesian epistemology. And exploring why renders this paper helpful for anyone who wants to really understand that difference.

Gelman & Yao

That paper is “Holes in Bayesian Statistics” by Andrew Gelman and Yuling Yao (released 11 Feb 2020). Their abstract is straightforward:

Every philosophy has holes, and it is the responsibility of proponents of a philosophy to point out these problems. Here are a few holes in Bayesian data analysis: (1) the usual rules of conditional probability fail in the quantum realm, (2) flat or weak priors lead to terrible inferences about things we care about, (3) subjective priors are incoherent, (4) Bayes factors fail in the presence of flat or weak priors, (5) for Cantorian reasons we need to check our models, but this destroys the coherence of Bayesian inference.
Some of the problems of Bayesian statistics arise from people trying to do things they shouldn’t be trying to do, but other holes are not so easily patched. In particular, it maybe a good idea to avoid flat, weak, or conventional priors, but such advice, if followed, would go against the vast majority of Bayesian practice and requires us to confront the fundamental incoherence of Bayesian inference.
This does not mean that we think Bayesian inference is a bad idea, but it does mean that there is a tension between Bayesian logic and Bayesian workflow which we believe can only be resolved by considering Bayesian logic as a tool, a way of revealing inevitable misfits and incoherences in our model assumptions, rather than as an end in itself.

From that description, they appear to be making assertions about Bayesian epistemology (“philosophy”; “logic” vs. “workflow”; reducing it to a mere “tool” as if it were ever anything else), but all their examples come from Bayesian statistics. Conflating the two is a mistake.

We’ll survey their examples in reverse order, which is really, in order from the least to the most interesting points they have to make. But first I should note that with respect to epistemology, I have already addressed all of their concerns in Proving History, particularly in chapters 4 and 6. And many of their proposed solutions to the problems they enumerate are the same as mine, or track similar thinking.

For example, Gelman & Yao conclude “that concerns about the subjectivity of Bayesian prior distributions and likelihoods can be addressed first, by grounding these choices in correspondence to observable reality (i.e., prior data); and, second, by making these choices transparently and with an awareness of contexts,” which is essentially exactly what I argue throughout Proving History, even early on in chapter 3 (pp. 81-85), but also several times in chapters 4 and 6.

They also, like me, dislike the convention of using the terms “subjective” and “objective” in describing probabilities; and though my solution is different than theirs, it serves the same end, in a different context (PH, p. 297 n. 4). They want to distinguish what I actually call arbitrary from empirical probabilities (e.g. we shouldn’t be pulling priors, unexamined, out of our mere gut feelings about things, but anchoring our decisions about prior distributions on something empirically grounded); whereas to understand what probability is doing epistemologically, I note we need to distinguish probabilities as the actual frequencies of things in the world, and probabilities as the frequency of our being correct to assert a given hypothesis as true given the “data” available to us. Though I prefer to say information rather than data, to avoid confusing any scientific technical uses of the word “data,” since Bayesian reasoning must be conditioned on all information available, whether we choose to “call” it data or not.

We must note that distinction, between physical and epistemic probabilities, because that is what probability is actually doing in any epistemology employing it. And, honestly, it’s all scientists ever get anyway: epistemic estimates of actual frequencies; we never know the “actual” frequencies of anything. We can only get increasingly accurate estimates of what those frequencies “probably” are. Even the most powerful statistical tools do no more than that. I do argue in PH that as we gain information, our epistemic probabilities (the only probabilities Bayes’ Theorem actually gives us) approach the “actual” probabilities, as in, the actual true frequencies of things in the world, and with enough information can get close enough that we can “act” like they are the same. Because any discrepancy we could have any confidence remains will at some point be smaller than our already-admitted margins of error. But the two probabilities never end up identical; because we never gain infinite information.

Hence, even our differences here are explained by our different projects. Gelman & Yao are trying to solve a problem in Bayesian statistics. I’m solving a problem in Bayesian epistemology. The distinction matters.

Now to those “holes” they were talking about…

Cantorian Limits?

“Cantor’s corner” is where, somewhere down the line, a model fails to fit the data. As models improve, as they more accurately fit the evidence, this “corner” gets pushed farther and father out. But often, eventually, you hit it. And that’s actually useful. As Gelman &Yao put it:

Scientific research is all about discovery of the unexpected: to do research, you need to be open to new possibilities, to design experiments to force anomalies, and to learn from them. The sweet spot for any researcher is at Cantor’s corner.

So you want to push things to find that place. Where Newton’s laws broke down, is where Einstein’s laws moved in. Where Ptolemy’s geocentric model broke down, is where Kepler moved in. And I say Kepler here, not Copernicus, for a reason. The Copernican system was actually a poorer fit to the data than Ptolemy’s and is actually a classic example of a bad scientific theory built on aesthetics rather than an actual interest in discovering the true explanation of observations. Ironically, contrary to mythology, Ptolemy was the empiricist, updating his models to fit the data; Copernicus was an idealist, disinterested in empirical reality. It took Kepler to finally do real science with heliocentrism (at least, after antiquity; plenty of real science was done with it in antiquity; medieval Christians just threw it all in the trash so we didn’t get to read any of it).

But of course even Kepler’s model broke down eventually; Einstein had to come to the rescue. And now we have Dark Energy and Dark Matter mucking everything up, the Inflationary Big Bang had to ride in to rescue observations from the failure of straightforward Relativity to explain the flatness or even expansion of the universe, and a lack of a quantum theory of gravity throws into doubt even our ability to fully explain that, and so on. Lots of Cantor’s corners.

How does this fit into a Bayesian epistemology of science? Gelman & Yao worry that because of Cantor’s corners, which you cannot predict but only happen upon by accident, “direct Bayesian inference” is, “ultimately, impossible in that it would require the anticipation of future steps along the diagonal, which by construction cannot be done.” And by construction they mean the constructing of complex mathematical statistical models. Statistics, not epistemology.

They propose a way to throw AI at the task. But this “AI would need to include a module for model criticism…to detect problems which at some point would require ripping out many stitches of modeling before future progress can be made.” In other words, a “go back to the drawing board” type of response to model failure, rather than continually trying to push the same models by adaptation. How would one program that in a Bayesian network? That’s their question, in a sense. Which “reveals another way in which Bayesian inference cannot hope to be coherent in the sense of being expressible as conditional statements from a single joint distribution.” By which they mean, either you incorporate the unknowns somehow into a Bayesian model, or your Bayesian model cannot be fully coherent.

Their solution, such as it is, is to simply admit “there will always be more aspects of reality that have not yet been included in any model.” Epistemologically, this means, all conclusions must be granted as tentative or provisional. Mathematically, this means, we must recognize there is always a nonzero probability, no matter how small, that any belief we hold is false—no matter how certain we are, and no matter how much evidence we have or how good it is. Those who fail to recognize this I have warrantably hailed with the appellation Doofus. But in terms of epistemic modeling, the application of this solution is important. And that requires describing “the modeling problem” altogether, before showing how Bayesian epistemology can solve it (as in, coherently account for it), which I again discuss in Proving History (e.g. index “underdetermination, problem of” and “old evidence, problem of”).

The modeling problem is this:

How do you know you are assigning the correct distribution of initial (or even final) probabilities to competing models when you don’t even know all the models that could turn out to be correct? Are you railroading yourself into believing one model (or a few possible models) correct that performs well, simply by having not thought of a competing model that might perform better? And in particular, if those unknown models have been assigned a zero prior (being given no part of the probability space), how can any evidence ever recover them, since even infinite evidence cannot overcome a zero prior?

Gelman & Yao’s answer is to simply say ‘fuck it’ and admit that sometimes you just have to tear shit up and start over; Bayesian modeling simply won’t tell you when or how to do that. This is true, so far, for the statistical machinery they are talking about. It’s probably not true for what’s actually going on in any scientist’s brain who actually does this. At what point does a scientist decide this needs to be done? How does the scientist assemble and assess what new tacks to replace old ones with? We can throw our hands up and say it’s unfathomable; the brain just churns this stuff out, who knows where it comes from. Intuition and creativity, you know, and all that whatnot. “It’s a mystery,” like a secular theologian apologizing for God.

But really, is it actually that mysterious from an epistemological perspective? We have a lot of historical documentation from scientists about how and why they came to revolutionary new ideas about things, from Archimedes’ bath to Einstein’s imagined clock-and-flashlight. The psychology of creativity has gone even beyond science for examples of just what the boundaries and causal inspirations are for flights of creative genius (see summaries at The Psychology of Creativity and The Neuroscience of Creativity; and the latest article on it at Neuroscience News). It’s pretty obvious it’s not magic. It is a black box, insofar as the precise neural machinery is inaccessible presently, and varies with every single brain. But the overall observation is pretty clear: our brains collect pattern data, randomly mix it, and test it in conceptual space, until one or more results pops into conscious attention, and we evaluate it, and if intrigued, we build it out with further conceptual testing (consciously now), and eventually, if it gets through all that and we are a scientist, we figure out a way to empirically test it. And so goes the entire history of science—particularly of its so-called “revolutionary paradigm shifts.”

That random churn is the most replicable because it isn’t statistical; yes, it is limited by causal inputs (which means richer environments and exposure to more ideas can assist it: collecting more data, as it were), but is random beyond that. It isn’t forming beliefs or making decisions, so Bayes’ Theorem isn’t relevant to it. Bayesian epistemology only becomes relevant at the point of evaluating an idea that thus comes to us from that random process—do we spend more time on that idea or ditch it? And again at the point where, after that, we conclude it’s worth developing an empirical test for. Both stages of evaluation will be Bayesian. As in, how our brain, our intuition, weighs the merits of the new idea, will follow what we know to be Bayesian neural processing: ideas that push against extremely low priors will have a harder time passing muster (“No, I don’t think gremlins or sunspots will have crashed Flight 370”), beyond maybe getting diverted into a science fiction novel; while ideas that evidential likelihoods already favor will have an easier time (“Pilot malfeasance does explain the data surrounding the Flight 370 better than most else we can come up with”); and eventually when the posterior probability gets high enough in our brain’s eye (which our brain is not measuring with actual numbers but an analog of neural signal strengths), we say “Yes, let’s try that.”

Gelmen & Yao are right that no brobdignagian statistical machinery can do this for them. Except once we are building AI and actually designing our own neural churn for it. But for now, “Bayesian statistics” isn’t going to solve this. But Bayesian epistemology will. Or really, it already has. Because Bayesian machinery in our cognitive systems is already causing this process to grind along, and always has; it drove Archimedes, Ptolemy, Kepler, Newton, Einstein, Lavoisier, Pasteur. And if we were so inclined we could even sketch out some simple equations on a napkin modeling why our brains are telling us some new idea is, in any given case, worth considering, or even worth testing. It wouldn’t be highly precise; it would just meet the threshold of explaining what actually happened: at a certain rough level of probability, we say yes to things. And that “rough probability” is built with a crude application of prior odds times likelihood ratios. And there really isn’t any other way it’s done; nor is there any other way it could be done that would work as well. It’s about as optimized as it can get. Maybe an AI could house a larger database of random things to mix and play with in the churn (in other words, handle much larger data sets to tinker-toy new ideas from), and do so faster, and with less crude (and thus error-prone) calculations. But in the end it will just be doing the same thing we already are: doing a rough Bayesian analysis to ascertain what random outputs of that churn are worth further time, and acting accordingly.

But this leaves one thing unexplained: how does this work logically? If these “new ideas” had really been assigned a prior probability of zero (since the other, old hypotheses considered were previously assigned all the probability space—even, we might presume, by this hypothetical AI), how can Bayesian reasoning, even on a rough scale, get us moving on it? For no amount of evidence, no likelihood ratio, can turn a zero prior into any nonzero prior. The answer lies in recognizing what a Bayesian model is actually describing. Every hypothesis is inclusive of infinitely many variants of that hypothesis. So in fact, all hypotheses are included in the probability space, even ones you’ve never thought of before.

This is most obvious when you simplify the probability space into a binary “hypothesis is true” vs. “hypothesis is false.” Obviously “hypothesis is false” by definition includes every other possible hypothesis, known or otherwise, that you are not already including under what you’ve defined as the hypothesis to be tested. And those unknown hypotheses have no further effect on the probabilities entered, because they do not put anything into the background knowledge that conditionally affects those probabilities (other than what impact there is from “no information”).

When you instead try to divide up the hypothesis space among all known hypotheses (using the extended long form of Bayes’ Theorem, for example: Proving History, p. 284), a common mistake can be to not include a place-holder for “all other possible hypotheses,” which means “all other logically possible explanations of the evidence not otherwise defined.” It is in principle okay to do that when your background knowledge establishes the conditional probability of such unknowns is small enough to ignore. And typically it is. It’s extremely unlikely something an expert hasn’t thought of yet is explaining observations, except when that is exactly what experts are concluding. Usually, they have several very likely possibilities to consider. And when they don’t, then they do indeed account for that. Either way, the math will coherently work out, if you’ve described the situation correctly.

For instance, in the field of history, accuracy beyond a single percentile in epistemic probability is usually unachievable, so theory sets that have a collective prior probability well below that will have no effect on your math. You can consequently act like they don’t exist—until such time as your background knowledge pertaining to that changes, then you update. But this is just a simplification. As such, it “looks” incoherent, on paper. Just as Gelman & Yao say. But since the simplification is actually just hiding an extra term “under the fold” as it were (the same way many mathematicians leave off “b” for background knowledge in every term of a Bayesian equation, on the assumption that it’s “obvious” it’s there so doesn’t need to be stated), it isn’t actually incoherent.

In logical reality, that place-holder remains hidden in the equation, a sort of “hypothesis-epsilon” with a vanishingly small prior and no appreciable advantage in likelihood. Which means it will emerge from there as soon as evidence starts boosting it and bringing it to the fore (hence my point about claims of the supernatural as an example atheists deal with a lot). Thus, in practice, unknown hypotheses do not start with zero priors, but vanishingly small ones. Thus on standard Bayesian reasoning evidence can restore them to warranted attention, because those priors will change as soon as new information is added, since all priors are conditional on the available information. Even just “discovering” a hypothesis, by itself, can add an extraordinary amount of information not previously held, and thus can have an extraordinary effect on the prior probability distribution.

As I wrote in Proving History (pp. 276-77):

We can never know all logically possible theories that can explain [the available evidence] and thus we can never know what their relative priors would be, yet they must all sum to 1, which would seem to leave us in a bind. But we are always and only warranted in believing what we know and what is logically entailed by what we know. Thus we don’t need to know all possible theories or all their priors. Hence, just as I explained in chapter 3 (page 86), [Bayes’ Theorem] actually solves this problem of underdetermination:
P(h|b) is by definition a probability conditional on b, and b only contains the hypotheses we know; ergo, since P(h|b) is not conditional on hypotheses we don’t know, hypotheses we don’t know have no effect on P(h|b). Until we actually discover a theory we didn’t think of before: only then might that new information warrant a revision of our knowledge. But that’s exactly what anyone would have expected. Hence it presents no problem. The possibility that that would happen was already mathematically accounted for in the measure of our uncertainty in assigning P(h|b), that is, our confidence level and interval for P(h|b).
In other words, [with our margins of error] we already acknowledged some nonzero probability that there is some conclusion-changing hypothesis we hadn’t thought of yet. Thus that we find one does not contradict our earlier assertion that there was none, because we only made that assertion in terms of probability, and since [Bayes’ Theorem] only tells us what we are warranted in believing with the information we have, and we didn’t have that information then, its old conclusion is not contradicted by the new one, merely replaced by it (i.e., the new conclusion does not logically contradict the old one because the content of b or e is not the same between the two equations, hence they remain consistent). So there is no problem of logical consistency, either.

This is how I solve for Bayesian epistemology what Gelman & Yao call the problem of Cantorian limits. You’ll note it tracks their own solution for Bayesian statistics, in fact implements it into the actual logic of an epistemology.

Failure of Bayes Factors?

Some attempts made in Bayesian statistics to argue one model is better than another, using Bayes factor analysis, rely on invalid approaches. As Gelman & Yao put it, “there is nothing wrong with the Bayes factor from a mathematical perspective. The problem, when it comes, arises from the application.” In other words, how you use it in a complex statistical model. This is really of little interest to epistemology. They note that “weak or noninformative priors can often be justified on the grounds that, with strong data, inferences for quantities of interest are not seriously affected by the details of the prior—but this is not the case for the marginal likelihood.” So in the latter case you have to be more careful how you assign priors. This really should be obvious. But even then, “Bayes factors” only “run into trouble when used to compare discrete probabilities of continuous models,” which is a problem that only arises when building particularly complicated mathematical machinery. Straightforward Bayesian epistemology never does this, so it never faces this problem.

Nevertheless, the closest analog I can think of is when people produce invalid Bayesian descriptions of what they claim to be doing as far as inferring what to believe given a body of information. In particular, the errors of not constructing mutually exclusive hypotheses or not defining an exhaustive probability space. For example, when someone tries to compare “Jesus’s body was stolen” with “Jesus rose from the dead” in order to argue the latter has in result a higher posterior probability because the former requires the ad hoc supposition of multiple hallucinations of a risen Jesus, whereas Jesus rising from the grave explains both a tomb being empty and all appearances to near 100% certainty without that additional hypothesis. Never mind the hidden flaws in this analysis for now, like “God exists” and “God resurrects people from the dead” being far less probable “ad hoc suppositions,” additional hypotheses even less credible than serial hallucination (see Then He Appeared to Over Five Hundred Brethren at Once!), or the fact that “there was an empty tomb” is actually an (already improbable) hypothesis, not itself data to be explained (see Why Did Mark Invent an Empty Tomb?). Rather, I am referring to the transparent flaw that “Jesus’s body was stolen” is neither exclusive of his being restored to life (naturally or supernaturally) nor exhaustive of all alternatives to “he was raised from the dead.”

If you’re going to do a Bayesian analysis of the resurrection claim, even just to ascertain the most credible secular explanation of the historical data, you need to build a coherent model of all mutually exclusive alternatives. And that will be more complex, or more all-encompassing, than some simple binary model like “either stolen or raised,” and getting more complicated or encompassing will produce a higher risk that you aren’t distributing the probability space coherently among the options—if you aren’t careful. So be careful. I have only a few things to say on this sort of thing in Proving History, since I assumed it would be too obvious to require much attention. But if you want to see a Bayesian approach done correctly (on the existence of God, for example), even with “a weak or uninformative prior,” see my chapter on Arguments from Design in The End of Christianity. If you want to see one done incorrectly, see my article on Swinburne & Unwin.

Incoherence of Subjective Priors?

Gelman & Yao complain that their fellow “Bayesians have sometimes taken the position that their models should not be externally questioned because they represent subjective belief or personal probability.” I agree that’s stupid. That Bayesians do this is usually a myth perpetrated by anti-Bayesians; I’ve yet to see a real Bayesian actually doing this. But I must assume Gelman & Yao have. I have a lot to say on the misnomer of subjective priors in Proving History (index, “subjective priors, problem of”). I discussed it again most recently in my Hypothesis article. The criticism usually confuses two different things: what is usually meant by subjective priors, as in priors based on the data available to you, and arbitrary priors, as in priors you just make up or inexplicably feel are right but can’t justify with any data. The latter are never justifiable, and when relied upon, that’s just bad reasoning. Garbage in, garbage out, is all that can get you.

Gelman & Yao seem to have some problem with arbitrary priors violating logical consistency (which would not surprise me, as they typically aren’t valid to begin with) but they never spell out what they mean exactly, and by giving no actual examples, I’m at a loss to infer just what they are getting at. Still, they do note that:

One virtue of the enforced consistency of Bayesian inference is that it can go in both directions. Start with a prior and a data model, get posterior inferences, and if these inferences don’t make sense, this implies they violate some aspect of your prior understanding, and you can go back and see what went wrong in your model of the world.

Indeed, that’s useful: you can “try out” a prior just to see if observations call your prior into question; rather than using your arbitrary prior to dictate what we should conclude from observations. The latter error is what concerns them, and I warn against it as well in Proving History. They go on to explain that:

Our problem is with the doctrine of subjective priors, in which one’s prior is considered unrefutable because it is said by definition to represent one’s subjective information. We are much more comfortable thinking about the prior (and, for that matter, the likelihood) as the product of assumptions rather than as a subjective belief.

I wholly concur, and this pretty much describes my approach in Proving History (see index, “prior probability”). Alas, again, I don’t know who these Bayesians are who are claiming their priors are “unrefutable.” Every Bayesian analysis I know affirms quite the contrary. So there isn’t much more I can say on the point.

Unreasonably Strong Inferences from Weak Priors?

We can do a lot by just assuming a weak or neutral prior of 50/50 just to save time and grief, because the evidence is so strong it wouldn’t matter where we set the prior, since the posterior will vary but little. But when the evidence is also weak (which is what produces the “marginal likelihoods” Gelman & Yao warned about), we can end up with strange situations like a posterior probability range of 45-65%, where we cannot say for sure what the posterior is, only that it’s most likely (let’s say, to a 95% certainty) between 45 and 65 percent. But 45% is below 50%. So included in the range is the “probably not” condition. And yet, most of the range is in the “probably is” condition. Someone might take that as grounds to “bet” the true posterior is above 50% and thus “bet” we have a “probably is” condition. It can be shown mathematically however that if someone did that, they’d lose that bet quite a lot of the time.

“In practice,” Gelman & Yao admit, “Bayesian statisticians deal with this problem by ignoring it.” They will essentially treat such weak results as “noisy and beneath notice, rather than focusing on the idea that this noisy result has, according to the model, identified” the posterior to “most likely” be favorable toward a model or hypothesis. Which is indeed how this problem tends to be dealt with. So really, Gelman & Yao have no problem here to report. Not in practice, that is. Bayesians, by their own admission, are correctly intuiting the error and avoiding it. But one could instead approach this from the perspective of epistemology, and ask, “Why is that a correct way to respond to such circumstances?” Yes, Bayesians throw a plank over this hole and walk over it. But what would actually fill the hole?

“One might say,” Gelman & Yao admit, that this is not really a “hole in Bayesian statistics” but rather it’s “just [an] implementation error to use priors that are too vague.” But outside science, in other words in the realm of Bayesian epistemology altogether, you often have to use vague priors, because, quite simply, that correctly describes the state of information available to you. That usually doesn’t happen in science, because usually vague things never pass peer review, or even win abductive consideration. Scientists only want to study things for which they can get good data. Which is fine for them—like high schools who get to pick the best students. But the other students still need somewhere to learn, and actually, are more in need of that better school! In reality, most decisions and beliefs have to be made on less than scientific data. Most by far. And honestly, usually these end up being the most important decisions of all (like who to vote for—something scientists, who need budgets, might notice is rather crucial to their enterprise). So scientists retreating behind their nice cherry-picked halls of “things we have lots of good data for” is not entirely helpful to the rest of us who have to make decisions and form beliefs without that luxury.

Gelman & Yao eventually propose to solve this problem for Bayesian statistics (in other words, for scientists) by simply making sure your description actually conforms to reality and when it doesn’t, fix your description. In this case, “if its implications under hypothetical repeated sampling seem undesirable,” i.e. a result that when trusted leads instead to repeated failure under replication, then “we can take [and] consider this as a prior predictive check” that “reveals additional beliefs that we have which are inconsistent with our assumed model.” In other words, it must be that we believe things are true about a system, that we haven’t incorporated in our description of that system; or even chose a description directly contradictory to what we actually believe to be the case. It is no surprise such a practice will generate incongruent results.

Likewise when we observe these failures after early results are encouraging but poorly supported, Gelman & Yao observe these are simply “the consequences of a refusal to model a parameter” in the system being described. In particular, at the epistemic level, inserting the information that ‘this result has a problematic margin of error’ into our background knowledge when making actual decisions based on the results. Thus admitting your result is only tentative (“the yes condition could be somewhere around 4 in 5 odds but we’re not sure yet”), not exact (“the odds are exactly 4 in 5 it’s in the yes condition”). Once again, their point is, if you aren’t doing this, then your chosen description is wrong. Fix it.

I address this issue in Proving History by repeatedly explaining the importance of framing all your inputs and outputs with margins of error, and allowing them to be quite large when the data are quite few (see index, “margin of error”), and relying as much as possible on argument a fortiori (see index, “a fortiori”). In the end, a result that overlaps uncertainty (like “45% to 65%”) should simply be resolved with agnosticism. In effect, it’s a “no result”: we just don’t know whether a hypothesis with that posterior probability is true or false.

As I wrote in 2012:

Indeed, “not knowing” is an especially common end result in the field of ancient history. [Bayes’ Theorem] indicates such agnosticism when it gives a result of exactly 0.5 or near enough as to make little difference in our confidence. [It] also indicates agnosticism when a result using both margins of error spans the 50% mark. If you assign what you can defend to be the maximum and minimum probabilities for each variable in [Bayes’ Theorem], you will get a conclusion that likewise spans a minimum and maximum. Such a result might be, for example, “45% to 60%,” which would indicate that you don’t know whether the probability is 45% (and hence, “probably false”) or 60% (and hence, “probably true”) or anywhere in between.
Such a result would indicate agnosticism is warranted, with a very slight lean toward “true,” not only because the probability of it being false is at best still small (at most only 55%, when we would feel more confident with more than 90%), but also because the amount of this result’s margin falling in the “false” range is a third of that falling in the “true” range. Since reducing confidence level would narrow the error margin, a lower confidence would thus move the result entirely into the “true” range—but it would still be a very low probability (e.g., a 52% chance of being true hardly instills much confidence), at a very low confidence level (certainly lower than warrants our confidence).
Proving History, pp. 87-88

This is essentially the solution of Gelman & Yao.

A larger problem their point captures, though, is that of model choice. In philosophy we atheists deal with examples of this problem when we approach claims of the supernatural, where we have an extremely fringe model being proposed. How do we assign a prior probability to it? We can’t even, as Gelman & Yao put it, “simply insist on realistic priors,” because “it is not possible to capture all features of reality in a statistical model” that way. Rather, we use general error margins and a fortiori reasoning. In short, epistemically, we don’t need to know the actual prior probability that a God exists who magically “cured the cataracts in Sam’s mum” (for which Tim Minchin pretty thoroughly explores the known hypothesis space). All we do need to know is that it is extraordinarily low (and yes, we can produce empirical, data-based demonstrations of that: see my articles on Extraordinary Claims and Methodological Naturalism) and that no evidence produces a likelihood ratio that supports it anyway (see my article on Bayesian Counter-Apologetics).

But if we had the latter, a total body of evidence that actually was more likely on the supernatural hypothesis than some other, then we would only have to ask, “Is it weak or strong?”; and if weak, we know that evidence could never overcome the extraordinarily low prior (even Coincidence would remain a more likely explanation in such a case), but if it were strong enough to itself be extraordinary, then we’d have something to consider that would warrant changing our mind. Alas, we observe, we don’t live in that world (see Naturalism Is Not an Axiom of the Sciences but a Conclusion of Them). But that’s why we don’t believe we do.

If we did observe ourselves in such a world, however, then the evidence we’d have would be extraordinarily improbable unless such a God existed, and the prior would never have trended downward to the extraordinarily improbable in the first place; and even had it done so, it would eventually be overcome by the evidence we now have, and so we should then conclude it’s most likely that God did indeed exist—even if we could still be wrong about his precise attributes (e.g. we might never be able to know with confidence a God was truly omniscient, but we could know he was extraordinarily powerful; likewise, moral; and so on). A good example of applying a fortiori reasoning this way to overcome the fringe model problem and the problem of assigning too high an early confidence in a model still relatively poorly supported is given by Efraim Wallach’s Bayesian analysis of how and why the current consensus rejecting the patriarchal narrative in the Bible as historical is valid. I give several more examples in Proving History, particularly in chapters 3 and 6.

Quantum Physics Disproves Probability Theory?

Finally, Gelman & Yao argue that in some conditions, some Bayesian models of quantum phenomena are incoherent. Particularly in trying to make predictions in “double slit experiments” with a statistical model. Such a scenario, they claim, “violates the rule of Bayesian statistics, by which a probability distribution is updated by conditioning on new information.” But they soon point out that “we can rescue probability theory, and Bayesian inference, by including the measurement step in the conditioning,” that is, by actually including in our background knowledge such basic information as “which slots are open and closed” and “the positions of any detectors.” But that’s so obvious I do not understand why they think this needs to be explained. What Bayesians are omitting crucial background information when conditioning the probabilities in their models? That’s straightforwardly invalid. Anyone doing that is simply a bad Bayesian.

Gelman & Yao separately complain that “Quantum superposition is not merely probabilistic (Bayesian) uncertainty,” or else, “if it is, we require a more complex probability theory that allows for quantum entanglement.” That’s again too obvious to have warranted the bother of saying. Of course any Bayesian model you intend to build of a quantum system must correctly distribute the probability space to account for superposed states. Who doesn’t know that, such that not doing it has become a “problem”?

Then Gelman & Yao bizarrely assert:

The second challenge that the uncertainty principle poses for Bayesian statistics is that when we apply probability theory to analyze macroscopic data such as experimental measurements in the biological and physical sciences, or surveys, experiments, and observational studies in the social sciences, we routinely assume a joint distribution and we routinely treat the act of measurement as a direct application of conditional probability. If classical probability theory needs to be generalized to apply to quantum mechanics, then it makes us wonder if it should be generalized for applications in political science, economics, psychometrics, astronomy, and so forth.

No, it really doesn’t. Now, if we started observing the kinds of things we see at the quantum level, like superposition and nonlocality and tunneling and so on, at the macro level—if political scientists, economists, psychometricians, astronomers, and so forth were actually encountering such phenomena—then, sure, we’d have to rebuild our statistical models to correctly describe that stuff, just as we would quantum scale stuff. But until that happens, no, there is no point in “wondering” about it. We observe that such phenomena don’t happen at that scale. So we don’t need to model it. End of story.

They admit “it’s not clear if there are any practical uses to this idea,” for example, they ask, “would it make sense to use ‘two-slit-type’ models in psychometrics, to capture the idea that asking one question affects the response to others?” The answer is, well, sort of, yes. That’s actually something like what psyshometricians are already doing. They have whole textbook chapters on the biasing and influencing of one question on others, including question order, wording, and all sorts of things. They often build models that account for this. One might say, “But, well, those models don’t map directly onto what’s happening in double-slit photon experiments,” but that’s because humans aren’t photons and questionnaires aren’t subatomic slats. You conform your model to what you are observing.

Though there is a sense in which you could do all statistics using quantum mechanics: the same sense in which you could do all biology, even psychology and sociology, using subatomic physics; since, after all, all macro phenomena are the aggregate effect of gazillions of quantum interactions, and even macro events technically have quantum wave functions— indeed there is one for the entirety of the whole universe. But that would simply be massively inefficient, and thus pointless. As I already noted 15 years ago in Sense and Goodness without God:

[S]ociology can be reduced to psychology, psychology to biology, biology to chemistry, and chemistry to physics. So, theoretically, all of sociology and psychology can be described entirely by physics. … [But t]hat psychology reduces to biology does not mean we can do without psychology and talk only biology, or turn all research in psychology over to biologists—and thence, by logical progression, to physicists.

The most obvious reason (as I go on to explain) is that it would be extraordinarily arduous to talk about, say, economics in terms of the entire causal system of atoms that manifests human biology and brains and activity. A physicist simply isn’t humanly capable of doing that. Biology can only be conducted at the level of abstraction developed for it. Yes, in theory, every book in biology could be “translated” into a book in pure physics, and ultimately pure mathematics, with not a jot of English or any other human language. But no one is capable of doing that; nor of reading the result so as to comprehend it. So “rewriting” economics in terms of quantum wave functions, though theoretically doable, is completely useless.

But what Gelman & Yao mean is more bizarre than even that ridiculous idea. They are asking if, perhaps somehow, superposition and other weird quantum phenomena might even exist at the scale of biology or sociology itself. Which is an empirically refutable speculation, and as such, just bad philosophy. It’s like asking whether there could evolve a bacterium the size of the earth that might eat us and “should we worry about that.” No, we shouldn’t. Move on.

Correctly Describing Quantum Systems

I think perhaps Gelman & Yao have fallen victim to the description problem: mathematics is simply a language for describing things (as I explain as an analogy in How Can Morals Be Both Invented and True?). In fact mathematics differs from other languages in only two respects, component simplicity and lack of ambiguity: component simplicity, as in every mathematical “word” references an extremely simple construct or function (as the difference between “42” or “factorial” and “a duck” or “love” makes clear); and lack of ambiguity, as in every mathematical “word” lacks valence (they typically mean one and only one thing, and that precisely). Again, “love” can mean a hundred different things, many not even precisely understood, as can even “duck,” not only because it is both a verb and a noun with no relation to each other in meaning, but even the noun “duck” can reference an animal, a toy, a tool, or a specific one of any of a million different instantiations of any of the above, whereas “42” and “factorial” each simply always mean one simple, precise abstraction, however adaptable. But once we account for those differences, math is just language. And like all language, all math ever does is describe.

When a mathematical tool is used to extract information from data, for example, it only works if the tool correctly describes the data, and whatever else is presumed to be the case (e.g. that the data was randomly sampled rather than cherry picked with an agenda; and so on). In cases of models, the model must map onto reality somehow, and do so correctly, or else eventually the model will fail to correctly describe what you are observing, which is the system you thought you had been correctly describing with that mathematical model (see All Godless Universes Are Mathematical). But the point here is that when a mathematical model fails to describe what happens in a system it purports to be a description of, your description is simply incorrect. You’ve incorrectly described the system. You must revise your description, revise your math.

There is nothing wrong with the math as such when that happens; it is not rendered incoherent by not cohering with observation—you have not, for example, “refuted probability theory.” To the contrary, what is incoherent is your choice of description and the thing to be described. Mathematics, like all language, is artificial, and as such analytical. Its conclusions are always necessarily true given the premises. And math’s component simplicity and lack of ambiguity is what enable us to prove that, and thus be assured of it (see my discussion of the epistemology of mathematics in Sense and Goodness, index, “mathematics”). Which is what gives math its utility. So if there is an incongruity between the output of a mathematical analysis, and the observed system being analyzed, the mistake is in the premises. You’ve simply incorrectly stated them. There is literally no other possible thing that can have happened. Which is useful to know. Not useful to deny.

Just to be thorough, I’ll note one fringe exception to this point, which is that there is, of course, always some nonzero epistemic probability we’ve screwed up our proof of the validity of a mathematical construct, even that 1+1=2, it’s just that the probability of that happening is typically so vanishingly small as to be worth disregarding altogether—until such time as someone discovers the error (see Proving History, p. 297 n. 5).

But back to Gelman & Yao. Their eventual solution to what they think is the quantum mechanical problem, is to say:

One way to rescue this situation and make it amenable to standard Bayesian analysis is to fully respect the uncertainty principle and only apply probability models to what is measured in the experiment. Thus, if both slits are open and there are no detectors at the slits, we do not express [any variable] x as a random variable at all. In a predictive paradigm, inference should only be on observable quantities.

Quite true. They note “this is related to the argument of Dawid” that “it can be [a] mistake to model the joint distribution of quantities such as potential outcomes that can never be jointly observed,” which illustrates the problem of incorrect description. If two outcomes cannot be jointly observed, your probability model had better correctly represent that fact; otherwise it is not correctly describing the system at all.

This is similar to how people, even expert mathematicians, will screw up the Monty Hall Problem, by applying an incorrect description of the actual probability distribution as soon as one “slit” is opened. Of course the Monty Hall Problem is not quantum mechanical. But it is analogous in one respect: Monty’s opening of one of the three doors literally would change everything as to the probability distribution of what’s behind the other two doors. Most people (sometimes even well-qualified mathematicians) do not update their model to account for this, and thus solve the problem incorrectly. They start with “the prize has one in three odds of being behind the door I selected” and correctly infer “therefore the prize has two in three odds of not being behind any other door” and from that conclude that Monty’s opening one of those doors to show you it’s empty can’t possibly change those original odds. After all, you didn’t change anything, and the prize hasn’t been moved, so why should you change your choice of door? It’s always 1 in 3, right?

In actuality, in the imagined scenario, Monty’s choice is restricted by information—he will never show you a door with the prize behind it, only one without; and he will never (at this point) open the door you chose, only one you didn’t. So the choice he then makes has now actually given you information you didn’t have when you first chose. Which changes everything. You should immediately abandon your original choice and pick the door you didn’t choose. Because, as it happens, the odds the prize is behind it are now 2 in 3. If that doesn’t seem right to you, you will now understand the significance of the analogy: when our assumptions about the statistical description of what’s happening are incorrect, then our results won’t match observations. This is not a fault in the math. It’s a fault in our premises. We simply haven’t correctly described mathematically the scenario we are actually facing. And we won’t get a match with observation until we do. And in fact, iterated enough times, “always switch doors” in the Monty Hall Problem will get you the prize two out of every three times.

Likewise, any incongruity between Bayes’ Theorem and quantum mechanical systems will never be the fault of Bayes’ Theorem, or probability theory. It will only ever be the fault of your having entered the wrong premises: you simply didn’t describe the probability space correctly. And Gelman & Yao offer us one way to do it correctly. Which is indeed by tying it to physical reality, e.g. “quantum superposition is not the same thing as probabilistic averaging over uncertainty,” so any mathematical description (i.e. mathematical “model”) that assumes those are the same thing, will fail to describe observations of that system. Revise your description. Just as they conclude, “quantum mechanics can be perfectly described by Bayesian probability theory on Hilbert space given the appropriate definition of the measurable space.” Exactly.

Which means, there really is no problem here to discuss. Other than the problem of all problems: get your damned description right. Fail at that, and all else is hosed.

Conclusion

Gelman & Yao note several things Bayesian statisticians have to look out for. This is of interest to people using Bayesian statistics. But that is not Bayesian epistemology. The latter is more universal and fundamental, describing even what scientists are doing when they think they aren’t even doing math. Their every inference, their every argument to any conclusion, is always fundamentally Bayesian; or else fundamentally invalid. Hence none of the “holes” in Bayesian statistics Gelman & Yao find are holes in Bayesian epistemology or logic. All the problems they note either don’t exist outside the rarified task of constructing complex mathematical models, or have already been solved by Bayesian epistemology. And understanding the latter requires not confusing it with the former.

4 Comments

Jared Robertson on February 25, 2020 at 8:35 pm

I know it’s got to frustrate the hell out of you having to explain to people what they should already know. I’ll be the first to admit I’m struggling with Bayesian analysis not with it’s premise but with the application. I probably won’t really get it until I sit in a classroom and actually work it out on paper. Hopefully you know there’s also people who suffer from cognitive dissonance and no matter how much proof or evidence or how impassioned a speech you give, there’s just going to be those people that cling to their faith because they basically sold their souls and they’ve invested all their self worth in a lie they HAVE to perpetuate or someone will find out they’re a Goddamn liar.
Barry Rucker on February 26, 2020 at 2:55 pm

An open letter to Dr. Carrier’s readers. I highly recommend buying, reading, and re-reading Dr. Carrier’s book, Proving History. I also highly recommend taking his affordable online courses on Bayesian epistemology and critical thinking. They pay off in more ways than one. Nearly every day I find occasion to use the concepts at least once. For example, yesterday I received two glowing, sincere recommendations from two people, one of whom is a friend. Using critical thinking and Bayesian epistemology, I avoided wasted time, wasted money, and a rabbit hole of superstition in one case. In the other case, I avoided pain and saved over $70,000!! What I purchased from Dr. Carrier cost ~.001 of what I saved in one day!! And such thinking is not particularly difficult. One needn’t be an expert–one benefits greatly from a basic understanding of the concepts Dr. Carrier teaches.
gmstoryfan on February 28, 2020 at 9:19 pm

I will have to read their paper. I am a big fan of the Everett many-worlds interpretation, so doubt that the ‘measurement’ will have to be included. As you quoted them: “quantum mechanics can be perfectly described by Bayesian probability theory on Hilbert space”
Simon Myers on March 10, 2020 at 6:46 am

Hey Richard, I have been reading and commenting for a while and we chatted by email a few times. I just wanted to note the small world, in that it’s very cool seeing you cite Adam Sanborn and Nick Chater`s paper! Adam is actually my PhD supervisor and Nick is constantly supporting and advising me.

What’s weird is, I’m working on morality rather than baysian approaches to cognition. Morality being the other main thing you are interested in! (Would be cool to link them at some point but I’m not there yet.)

Bayesian Statistics vs. Bayesian Epistemology

My Actual Position

“Statistics” vs. “Epistemology”

It’s Bayes All the Way Down

Understanding Bayesian Epistemology

Gelman & Yao

Cantorian Limits?

Failure of Bayes Factors?

Incoherence of Subjective Priors?

Unreasonably Strong Inferences from Weak Priors?

Quantum Physics Disproves Probability Theory?

Correctly Describing Quantum Systems

Conclusion

§

4 Comments

Add a Comment (Moderation Queue Except for Patrons & Select Persons)Cancel reply

Get Dr. Carrier’s Latest!

Categories

Archives

About the Author

Support Dr. Carrier

Books by Dr. Carrier

Explore C.H.R.E.S.T.U.S.

Bayesian Statistics vs. Bayesian Epistemology

My Actual Position

“Statistics” vs. “Epistemology”

It’s Bayes All the Way Down

Understanding Bayesian Epistemology

Gelman & Yao

Cantorian Limits?

Failure of Bayes Factors?

Incoherence of Subjective Priors?

Unreasonably Strong Inferences from Weak Priors?

Quantum Physics Disproves Probability Theory?

Correctly Describing Quantum Systems

Conclusion

§

4 Comments

Add a Comment (Moderation Queue Except for Patrons & Select Persons)Cancel reply

Follow Dr. Carrier’s Work & Announcements…

Get Dr. Carrier’s Latest!

Categories

Archives

About the Author

Support Dr. Carrier

Books by Dr. Carrier

Explore C.H.R.E.S.T.U.S.

Discover more from Richard Carrier Blogs