Comments on: Two Bayesian Fallacies

By: Richard Carrier

Richard Carrier — Tue, 28 Jan 2014 18:34:07 +0000

Bayes’ Rule assumes that we are certain about our evidence.

Note that that is just like using Newton's equations to predict rate of fall, even though that will almost always produce a false result (see my discussion here). It's an approximation, not an exactitude, because exactitude would require far more work for no useful gain. If you really wanted to, the probability of evidence being incorrect can be incorporated into Bayes' Theorem (using a standard model of total probability). It's just really a lot of work for no useful gain (especially when we are arguing a fortiori, see Proving History, index "a fortiori").

And that is a serious problem for Bayesian accounts of confirmation.

No, it isn't. And if you were up on Bayesian literature, you'd know it isn't. For why it's no problem at all (and the scholarship) see Proving History, pp. 277-80, where I discuss exactly the same example you bring up. (You also don't seem to know how Bayes' Theorem works. That Mercury's orbit was known in b is simply no different from it being observed in e, and one can move items between b and e however you want, as long as the two sets don't overlap. I discuss this several times in Proving History, index "demarcation". The probability that e if h is not the probability that e is observed, it's the probability that e would be observed if h is true. Before Einstein, the probability that that e would be observed on any then known theory was very small; when Einstein came along, his theory then entailed the probability of that e was remarkably high; thus, it became evidence for his theory--one merely has to control for the retrofitting fallacy, but that didn't apply in that one example, which is notably why that's the example everyone uses: it's so rarely the case that retrofitting can't be the cause of a retrodiction, that there are few examples to choose from. I explain in Proving History, pp. 277-80, why retrofitting wasn't a causal factor in that case. As has been noted by other exerts, whom I cite there.) Oh, and BTW, "Old Evidence, Problem of" is even in the index to Proving History. So don't think you are somehow surprising me with this.

By: Jonathan Livengood

Jonathan Livengood — Mon, 27 Jan 2014 12:25:09 +0000

In reply to iainmartel. Richard, I think you're not getting Iain's points. Let me see if I can make the second one clearer, since if Antony meant to be raising the problem of old evidence, it really is a somewhat serious problem. Bayesians of a pretty minimal stripe typically adopt two constraints on rational belief. First, they claim that rational degrees of belief (at any given time) must obey the axioms of probability. And second, they claim that rational degrees of belief are to be updated, given evidence, using Bayes' Rule (not to be confused with Bayes' Theorem) of Conditionalization, which says: Pr_new(h) = Pr_old(h | e) In words: When I update my credences on the basis of some evidence e, the new credence that I assign to each hypothesis h in my hypothesis space must be equal to my old conditional credence in h given e. That is, I'm moving from one distribution over the hypothesis space to a new one, and Bayes' Rule tells me how to make the move. (Note that whereas Bayes' Rule is controversial among epistemologists, Bayes' Theorem is not controversial. Even benighted frequentists accept Bayes' Theorem. Everyone who accepts the usual probability axioms, the usual definition of conditional probability, and classical logic has to accept Bayes' Theorem!) Now, consider what your credence for e should be after learning that e is the case. Obviously it is one. We're just updating: Pr_new(e) = Pr_old(e | e) = 1. In other words, Bayes' Rule assumes that we are certain about our evidence. We're not quite to the problem yet. We need an historical example to motivate the problem. The example that Clark Glymour used when he introduced the problem of old evidence is Einstein's use of the theory of general relativity to explain the anomalous advance of the perihelion of Mercury. At the time that Einstein was working out the details of his general theory, it was well-known (and had been well-known for a long time) that Newton's theory did not correctly predict Mercury's orbit. Moreover, it was well-known what Mercury's orbit looked like. And that is a serious problem for Bayesian accounts of confirmation. Naively, the fact that Einstein's theory gave the right answer for the orbit of Mercury helped to confirm the theory. But if the Bayesian picture is correct, then fitting the orbit of Mercury should not have changed the (subjective) probability that the theory was true. It should not have affected anyone's rational credences. The reason is that the evidence -- the shape of Mercury's orbit -- was already known. It was already in the background beliefs. Naively, one would think that Pr_new(GTR) = Pr_old(GTR | M) > Pr_old(GTR), where M is the observed orbit of Mercury. But since M was already known, Pr_new(GTR) = Pr_old(GTR | M) = Pr_old(GTR). In the linked essay above, Glymour considers some ways one might try to escape the problem of old evidence. To take the example he likes best, you could treat the evidence as the fact that the theory predicts the known-in-advance observation. That is, the evidence would be a structural relation between a theory and an observation, rather than just an observation. But there is a lurking worry here that, as Glymour puts it, the degrees of belief become epiphenomena, while the structural relation does all the real work.

By: Elle87

Elle87 — Wed, 10 Jul 2013 21:06:12 +0000

I’m posting this for no particular reason, I just thought you may find it funny 🙂

http://xkcd.com/1236/

By: Richard Carrier

Richard Carrier — Mon, 03 Jun 2013 18:38:14 +0000

In reply to Robert Allen. That's essentially what I said ("Except it wouldn’t be 50/50 if we had none of that evidence, as if every other object we encountered in the universe is made of cheese." ... in other words, if half of all objects randomly encountered were made of cheese, then a prior of 50/50 would be warranted, so conversely if we adopted an artificial zero knowledge position and asserted a prior of 50/50 for cheese we would be asserting that half of all objects randomly encountered will be made of cheese, which cannot validly be asserted on a state of zero knowledge). Of course, on actual background knowledge it would not be an equal probability for all materials. We have a lot of background evidence regarding the frequencies of different materials in natural space objects, which gives us better than equal chances for all possible materials (and extremely low chances for absurdities like cheese). Even if we step back and assume a position of, say, an ancient astronomer, where they lacked data on space objects, there still would not be an equal distribution, since inferences could be made based on appearance and causality and what materials were at least then known (not as reliable as inferences we can make now, but far more reliable than just randomly assigning all possible materials the same probability; case in point: many ancient astronomers correctly inferred the moon was made of earth, which is well nigh statistically impossible if all materials had equal probability, so they clearly were using a more reliable standard of inference than that). If we posit a zero knowledge position, then you wouldn't model the problem as "made of cheese or not" because in a state of zero knowledge you've never heard of cheese. A person in that state could not make any assertion at all about what the moon was made of, until they started gathering more and more knowledge (hence the rise from cave men thinking it's a light in the sky to ancient astronomers concluding its a rock of some kind to moderns being able to deduce ever-more accurate conclusions as access to data increases). But to be charitable to the commenter's original point, this essentially was their point (even if they got the math wrong, their idea was correct).

By: Richard Carrier

Richard Carrier — Mon, 03 Jun 2013 18:27:11 +0000

In reply to Jon Drake. AP calculus, a college-level statistics course, a semester of electronics engineering, a year serving as a sonar technician at sea, a Ph.D. in the history of science, and several years of advanced reading and discussion with experts in the philosophy of mathematics generally and Bayesian reasoning specifically. Since my use of BT requires nothing more than sixth grade math, I really don't even need those qualifications. But in any case, my book (Proving History: Bayes's Theorem and the Quest for the Historical Jesus) passed peer reviewed by a professor of mathematics. And I have similarly published under peer review work in statistics and probability before that (Richard Carrier, “The Argument from Biogenesis: Probabilities Against a Natural Origin of Life,” Biology and Philosophy 19.5 [Nov 2004]: 739-64). Rather than obsess over qualifications, though, why not actually address the arguments I make?

By: Richard Carrier

Richard Carrier — Mon, 03 Jun 2013 18:16:47 +0000

In reply to Jon Drake. The first question rather is: What will you use instead? Whatever it is, either it will conform to BT or it will violate it. If the latter, your conclusion will be logically invalid (PH, pp. 106-18). So the second question is: How much does it bother you that you won't know when your reasoning is logically invalid, because you won't know when it conforms to BT? If that doesn't bother you much, then I can't help you. You have a much bigger problem to solve first. But if it bothers you enough, you should try harder to understand the logic of BT. Since it requires no more than sixth grade math, I doubt it is beyond your ability to understand it if you try. Note, however, that you don't have to understand this specific post to apply BT. This post relates to high level meta-challenges to the underlying logic and epistemology of BT. Just as you don't need to understand philosophy of science to be a good scientist, you don't need to understand this meta-level stuff to apply Bayesian reasoning. Emphasis on need to. You certainly can, IMO. But that would require effort, which introduces the question of practicality (is it worth the bother). And that gets us back to the same cycle of questions: Does it bother you that you don't understand why the two objections to Bayesian reasoning I rebut here are incorrect? If not, then you needn't worry about it. You are then okay with the conclusion that they are incorrect and you can just move on to something more important. But if it does bother you, then you need to make more of an effort to understand why these two objections to Bayesian reasoning are wrong. And I think you could. Certainly if you did your best and then asked questions here to get more information about what is still stumping you. Progress would be inevitable I think.

By: Richard Carrier

Richard Carrier — Mon, 03 Jun 2013 16:53:51 +0000

I was asked off-thread what I thought of two remarks about this post on Reddit. I don't participate on Reddit, because It's an unregulated cesspool and I don't believe anything actually productive can ever get done there. Indeed, anyone who prefaces a critique with the sentence "Bullshit article is bullshit" (as one of the comments I was asked about does) looks more like something written by a child than anyone worth taking seriously. But for the adults in the room: (1) Regarding a remark concerning arbitrary priors and Kolmogorov complexity, this is a classic example of overthinking a problem, and illustrates the difference between an armchair thinker and someone who actually uses Bayes' Theorem in real-world applications (where it has been so thoroughly proven superior to any other epistemology, one has to immediately question anyone who claims it can't possibly work, just like someone who claims evolution can't possibly work because "reasons"). In principle iterative Bayesianism solves the problem of arbitrary priors, since all priors are based on the same data, and all data has to be swept up in the equation eventually anyway, since b + e = all available knowledge without remainder. So it doesn't matter which prior you start with. The question rather is whether we can complete a Bayesian iteration in every case. And the answer is no; in many cases we have to ballpark it. But that is a problem for all epistemologies, not just Bayesianism (see Epistemological End Game). I address both facts in my book, in detail (for example, Proving History, pp. 229-56). The bottom line is that the selection of reference classes (hence priors) is not all that arbitrary (and indeed, any reference class you don't use falls back into the evidence and thus affects the posterior probability anyway: as I show, again, on pp. 229-56). What this critic also seems to be unaware of is the use of a fortiori reasoning with large margins of error, a technique that solves most problems involving the selection of probabilities in BT, and that is the technique I lay out in my book Proving History (index, "a fortiori"). I further show that all other valid epistemologies can be described by BT, thus all other valid epistemologies are modeled by and thus reduce to Bayesian epistemology. All the problems and limitations of the one are represented in the other, in a logical 1:1 correspondence. So no criticism of Bayesian epistemology can be valid that doesn't also take down every other epistemology worth having. So you may as well just role up your sleeves and solve the problem, whatever you claim it is. And in my experience, BT makes doing this a lot easier, because it shows the correct logical relations between evidence and sound belief. Note that at no point do I have to appeal to anything as needlessly complex as resorting to Kolmogorov complexities to define parameters in BT. That one can do this only shows that what we are doing with BT has an ontological ground, and that without remainder. But one almost never has to actually do that to know that--just as one doesn't have to actually translate this comment into binary code to know that it can be done. To carry that analogy further, anyone who is well enough familiar with such a procedure doesn't have to actually translate this comment into binary code to know roughly how much data space that would consume, and generally one doesn't ever need to know that beyond roughly, e.g. our data storage and transmission speeds are so large now that anything in the Kb range is insignificant when it comes to calculating times and storage limits, so we usually round to the Mb or maybe the 100s of Kb... "Kolmogorov-style" precision is simply useless, although someone interested in reducing the information in this comment to its smallest possible bit length might find that an amusing puzzle, the answer is generally of no use to anyone (except maybe cryptologists and engineers who are faced with solving extreme limitations in communications, such as how to increase the efficiency of ELF transmissions for submarines). (2) It was suggested that John Pollock's paper "Problems for Bayesian Epistemology" refutes me, although that isn't true for this blog post, and is certainly not true for my book, which actually refutes him, at length. Which tells me whoever posted that comment hasn't read my book. Indeed, all the "problems" Pollock refers to are mooted by my demonstration that "degrees of belief" reduce to frequencies (Proving History, pp. 265-80, although the preceding section is important to the point as well: pp. 257-65), and thus probability calculus applies by all the same proofs ever developed (and so all his objections have nothing to apply to in my application of BT). I provide a more formal refutation of Pollock's entire thesis on pp. 106-14 (and relevant to understanding that is what is shown in the preceding section again, pp. 97-106), where I prove that all epistemologies necessarily reduce to BT (so any problems with Bayesian epistemology are equally problematic for any other epistemology you care to name...provided that epistemology has any logical validity; but any epistemologies that violate logic should obviously be rejected). Pollock is correct about one thing, though: all we can get out of Bayesian reasoning is warrant ("warranted belief"), not "knowledge" in the hyper-specific sense of justified true belief, except insofar as it is probabilistic belief, because we can only have justified true belief that "given the information available to me at time t, the epistemic probability that h is true is P at t" (and not that "h is true," full stop, the impossible dream of too many a philosopher these days). But it is not hard to show that all epistemologies suffer the same problem, and Bayesianism only exposes the problem in greater clarity. Basically, any epistemology that denies the probabilistic nature of all knowledge claims (and thus acknowledges that anything, literally anything, we claim to be "knowledge" could be false...i.e. it has some nonzero probability of being false) is an epistemology no human being could ever actually deploy (and indeed, even a god could not: PH, p. 331, n. 41). So holding out for such an epistemology is foolish.

By: Robert Allen

Robert Allen — Sun, 02 Jun 2013 15:10:33 +0000

Richard,
Just wanted to point out that 50/50 is not the correct prior probability of the moon being made of cheese even with no background knowledge. If it were, then it could similarly be claimed to be 50% for being made of pasta, porcelain, or wood, which of course adds up to more than 100%. Instead, the correct approach, with no background knowledge, would be to assign equal probabilities to all conceivable moon materials. With thousands of possible materials, and no reason to assign additional likelihood to cheese (no background knowledge), we would have a very low probability for cheese. Thus, with no background, the correct assessment would be that any guess regarding the moon’s composition is extremely unlikely. Maybe I’m just nitpicking here.

By: Jon Drake

Jon Drake — Sun, 02 Jun 2013 14:28:42 +0000

Richard, it would also help if you could clarify your academic qualifiations in mathematics. I can’t seem to find much on these.

A lot of people on our campus have asked about this, so if you think it is going away because you block discussion of it, you are mistaken.

By: Jon Drake

Jon Drake — Sun, 02 Jun 2013 14:25:23 +0000

Richard, there are those of us who have tried to understand your use of the theorem as describe above but who just don’t get it, and probably don’t have the time to spend on it.

Should be then not use it, and hold back on it or should we just accept it based on your experience and education?