The most important advice you could ever get for becoming a reliable critical thinker are the following three tips, each of which depends on probabilistic reasoning. You might want to take my online course in Critical Thinking for the 21st Century to really dive into this, and ask me all the questions that come up for you too. But here I’ll start with a little primer on it. I’ll follow the three main tips with some more general advice on how to reason better about probability. Because in real life everything boils down to a probability. And anyone who does not know that, or understand what it entails, will not think so reliably about their own beliefs, or others’.

  1. Of any claim or belief you think is true or that you feel committed to or need or want to be true, ask yourself: “How do I know if I’m wrong?” Being able to answer that question; knowing you need to answer it; and the answer itself—all are invaluable. You cannot know if you are right, if you don’t even know how you’d know you were wrong.
  2. The most reliable way to prove a claim or belief true is by trying to prove it false. Failing to prove it false increases the probability of it being true, and does so the more unlikely it is that those attempts to prove it false would fail—unless the belief or claim were true. The scientific method was a crucial discovery in human history because it is almost entirely based on exactly this fact.
  3. A good way to simplify the task of trying to falsify a claim is to find the best opponent of that belief or claim and investigate the truth or validity of their case. Because they will likely have done a great deal of the work already, so all you have to do is logic-test and fact-check their work.

For that last point, a “good” opponent is one who is informed, reasonable, and honest. So as soon as you confirm someone hasn’t been honest, or often relies on fallacious reasoning, or often demonstrates a fundamental ignorance in the subjects pertaining, you should stop listening to them. Go find someone better.

I do realize this advice can’t help the genuinely delusional; they will falsely believe any opponent is dishonest, fallacies, or ignorant regardless of all evidence otherwise, so as to avoid ever confronting what they say. Reason and evidence will have no effect on them, so advice to follow reason and evidence won’t either. The advice in this article can only help the sane.

Once you’ve found a good critic, so-defined, it can be most helpful to you to build a personal relationship with them or otherwise cultivate a charitable, sympathetic, patient dialogue with them, if either is available (it often won’t be; we all have limited time), and then make it your point to learn as much as possible, rather than reducing your interaction to combative debate. The best way to do this: instead of “refuting” them, aim to understand why they believe as they do. Then you can test for yourself the merits of their reasons, which then you will more clearly and correctly understand. This produces a good falsification test, rather than combative debate which tends toward rationalization and rhetoric, which is a bad falsification test. And you can’t verify your beliefs with a bad test.

Likewise, if you set about attempting to prove yourself wrong in a way already engineered to fail (e.g. you only go looking at straw men, you only apply tests that wouldn’t often reveal you were wrong even if you were, and so on), you are not really following these principles, but avoiding them. Surviving a falsification test only ups the probability your idea is right if that falsification test was genuinely hard for a false belief to pass. In fact, the degree to which you can be certain you are right about something, is directly proportional to how likely your tests would have exposed you were wrong, if you had been. That is, in fact, the only way to reliably increase the probability of any claim or belief.

These three tips all focus on one key task: looking for evidence that’s unlikely to exist, unless the claim or belief is true; or evidence that’s likely to exist, unless the claim or belief is false. Thus trying hard to find evidence you should expect to exist if it’s false, and not finding it, is the best evidence to look for. Because that’s unlikely…unless the claim or belief is true. Not impossible. But improbable. Which is why understanding probability is always crucial. Because it’s always going to be about that.

So let’s talk about probability.

Backstory

Theories of probability were first developed in ancient Greece and Rome, but were largely forgotten in the Middle Ages and had to be more or less reinvented in the 17th century. Ancient theories of probability were not developed from dice games as some think, but from divination and lottery systems (some similar to scrabble games today, others more like the lottery systems developed in Athens for selecting jurors, for which they even built mechanical randomizing machines). Also contrary to what you might often hear, the Greek numerical system was not a hindrance to the development of sophisticated mathematical theories, and they achieved some remarkable advances (see my discussions and bibliographies on ancient mathematics in The Scientist in the Early Roman Empire).

Ancient probability theories were based on a developed theory of permutations and combinatorics (the study of counting, combinations, and ratios among them), which we know from hints in extant texts (where they report solutions to problems in combinatorics that are correct, a feat only possible if they had mastered the underlying theory) and the recovery of one lost treatise in combinatorics by Archimedes in the 20th century—which had been erased by medieval Christians, covered over with hymns to God, but now is fully reconstructed using the particle accelerator at Stanford, which combined with computerized axial tomography mapped the location of all the iron atoms in the codex, from which the lost ink could be reconstructed. (Science beats religion for the win.)

But we know the ancient Greeks and Romans wrote a lot more about probability theory and its foundations, and even used it in some of their philosophy. It’s just that every treatise they wrote on it was tossed in the trash in the Middle Ages, with the exception of one, which was erased (and even that was one of the earliest and most primitive exercises, not representative of subsequent advances). For examples of evidence and discussion on this see my book (cited above) and Russo’s Forgotten Revolution, pp. 281-82. Many other examples can be found, e.g., in places like Cicero, On the Nature of the Gods 2.93, which briefly mentions a scrabble-game method of running probability experiments.

They knew that the fundamental basis of probability amounted to knowing the ratios among possibilities and combinations of possibilities, and that ultimately it was about frequency, expected or actual.

Arithmetic

All empirical arguments are arguments over probabilities. And probabilities are mathematical. So it is not possible to think or argue soundly over probabilities without a basic command of mathematical reasoning. This means all fields of empirical knowledge are fundamentally mathematical. No one gets a pass. Even history is all about the probabilities of things—probabilities regarding what happened, what existed, what caused it. Therefore even history is fundamentally mathematical. I illustrate this extensively in my book Proving History and the underlying point was independently confirmed by Aviezer Tucker. This means critical thinking even about history requires mathematical knowledge (though not solely; I discuss lots of resulting heuristics and methods in doing history in my monthly online course on Historical Methods for Everyone). Nothing so advanced as formal statistics is necessary; but a basic command of probability theory is (see my article Bayesian Statistics vs. Bayesian Epistemology for what that difference is and entails).

I fully sympathize with the fact that many of us have forgotten how to do a lot of that basic stuff. Most of it really is no more advanced than sixth grade level in American schools. If you need help there, almost everything you would need to know is taught entertainingly well in Danica McKellar’s book Math Doesn’t Suck (2008). Highly recommended for future reading someday. For now, there are some basic free primers online that will jog your memory and provide some tips on how to calculate and multiply probabilities, such as Decimals, Fractions, and Percentages at Math Is Fun, and Introduction to Probability at Drexel Math Forum.

Anything more advanced than is covered there isn’t necessary, though often nevertheless could be used. So mathematicians have a lot to offer the humanities, if they’d take a more collaborative interest in it; which, I admit, would require people in the humanities to want to talk to them, and Fear of Math often prevents this. Still. Everyone doesn’t need to be a mathematician. Just a competent high school graduate.

Probability

Probability is understood in a number of different ways.

Probability as Logical Possibility: For a quick basic-level tutorial in how probability derives from solutions in combinatorics, explore the web tree at CoolMath. You don’t need to master the subject. But you should have some impression of what it is and what it involves. You should note that this definition of probability actually reduces to the next one, “frequency,” and thus isn’t actually a different definition, it just differs by working with “ideal” frequencies rather than measuring them in the real world. If you want to explore more the difference between those two things and how they connect, see my discussion in Proving History, pp. 257-65.

Ultimately, most probability reasoning in practice is based on guessing at ideal frequencies, using hypothetical models rather than real world surveys. But hypothetical models can be more or less accurate (and thus we can strive to make them as accurate as we need or can manage), and ideal frequencies can be close enough to real frequencies for most practical reasoning in daily life. Particularly if you use a fortiori frequencies: guesses as to actual frequencies which are so far against the evidence you do have that you can be sure that whatever the real frequencies are, they will be larger (or, depending on what direction you are testing, smaller) than that.

Probability as Frequency: It is my concerted conclusion that all definitions and theories of probability reduce to this. Not everyone agrees. But I am fairly certain that all a probability is, is an estimate of the frequency of something (“How often does that happen?” or even more precisely, “How often does or will x happen, out of y number of tries/spatial-zones/time-periods/etc.?”). The only question that really differs from case to case and application to application is how you answer one crucial question, “Frequency of what?” What are you measuring, when you state a probability? Often this is not adequately answered. It’s wise to answer it.

Probability as Propensity: This is actually just another way of talking about frequency. This time it is talking about a hypothetical frequency rather than an actual one (as I discuss, again, in Proving History, pp. 257-65). Rolling a single die once in the whole of history has a probability of turning up a “1” equal to the frequency of that die in the same intended circumstances rolling a 1 in a hypothetical set of infinite rolls of that die. It’s still a frequency, but we have to use a hypothetical model rather than a real-world measurement.

Most science works this way (we apply hypothetical models to predict the behavior of the real world, rather than, for example, constantly re-testing whether gravity works the same everywhere and changing the gravitational constant every single time because a different measurement is made every single time). And we do this in most probability estimating in real life, too. And this can indeed differ from the “logical possibility” application I just noted above: by instead of using an “ideal frequency”—which assumes, for example, a six-sided die always rolls each number exactly equally often—we could, if we wanted, use a measured frequency. For example, maybe the die we are talking about rolls a 1 slightly more often, and we know this because we tested it with a bunch of earlier rolls, and from that data we built a hypothetical model of infinite future rolls of that specific die that haven’t yet been made and may never be, but we can more reliably predict now what its rolls would be from our now-more-accurate model of its bias.

Probability as Degree of Belief: This is also, I conclude, just another frequency measurement, and thus reduces again to the same one definition of probability as a measure of frequency. Only now, we are talking about the frequency of being right, given a certain amount of information. For example, if you predict a 30% chance of rain, when, given the information you have, on 30% of days you make that same prediction it rains (actually, or in a hypothetical extension of the same conditions), then the frequency with which you are “right” to say “it will rain” is 30% (or 3 times out of 10), and you are simply stating that plainly (and thus admitting to the uncertainty). So it is again just a frequency.

Ultimately this is an attempt to estimate some actual frequency (in that example case, the frequency with which it rains, when given certain available data). In other words, our “degree of belief” is sound when the frequency of an event (the frequency we are claiming) is the same as our degree of belief. Consequently, when we have good data on what the latter frequency is, we can simply substitute it for the former frequency. The two are interchangeable in that respect (the two being “the frequency of x” and “the frequency of our being right about x“). For more on demonstrating these points, see my discussion in Proving History, pp. 265-80.

But of course, often we don’t have good enough data, so we can only state the known frequency, which will be in error when measured against reality, and then correct it as we gather more data proving otherwise. This is called “epistemic probability,” the probability of a belief being true, which can be simply restated as “the frequency of such a belief being true.” Which frequency (which probability), like the rain prediction, approaches the true probability as our access to relevant information increases.

There are also two different kinds of frequencies measured this way: the frequency with which we would be right to say “x will occur” (“There is a 30% chance it will rain today” means “There is a 30% chance I would be right to say it will rain today”), and the frequency with which we are right to say that (“There is a 95% chance that there is a 30% chance it will rain today,” which would mean “There is a 95% chance I would be right to say there is a 30% chance I would be right to say it will rain today”). And usually that requires stating the latter as a range, e.g., not “30%” but something like “25-35%.” In technical terms these are called the confidence level (CL) and the confidence interval (CI). The CL is the probability that the CI includes the true frequency of something. And here again, “the probability that” means “the frequency with which.” In both cases.

In various precise ways, without adding more data, increasing the CL increases (widens) the CI, while narrowing the CI entails decreasing the CL. So, for example, if there is a 90% chance that the true frequency of something (e.g., it raining in certain known conditions) is somewhere between 20% and 40%, you will not know whether the true frequency (the probability) is 20% or 40%, or 30% or 35% etc., only that, whatever it is, it is somewhere between 20% and 40%. And even then, only 90% of the time. If you want to be more certain, say 99% certain and not just 90% certain (since 90% means 1 in 10 times you’ll be wrong, and that’s not very reliable for many purposes; whereas 99% means you’ll be wrong only 1 in 100 times, which is ten times better), then the CI will necessarily become even wider (maybe, for instance, 5% to 60%, instead of 20% to 40%), unless you gather a lot more information, allowing you to make a much better estimate. The latter is the occupation of statistical science. Which can only be applied when lots of data are reliably available; which is not commonly the case in history or even ordinary life, for example. So we must often make due with wide margins of error. That’s just the way it is.

Moreover, we usually want confidence intervals we can state with such a high confidence level (like, say, 99.99%) that we don’t have to worry about the CL at all. It’s just that that often requires admitting a lot of uncertainty (a wide margin of error—like, say, admitting that there is a 20% to 40% chance of it raining, and not actually with certainty exactly a 30% chance). Whereas accepting smaller CL’s to get narrower CI’s just trades uncertainty from one to the other. Like, say, instead of requiring a CL of 99.99%, we could accept one of just 99%, but all that means is accepting the fact that, whatever the CI is you are estimating and thus getting to narrow down (like, say, now we get a range of 29.9% to 30.1%, which rounds off to just 30%, allowing you to just say “30%”), you will be wrong about that 1 out of every 100 times (as opposed to being wrong only 1 out of every 10,000 times in the case of a 99.99% CL). But being wrong 1 out of every 100 times would mean being wrong more than 3 times a year if you are making daily predictions of something, like whether it will rain the next day. So moving those numbers around doesn’t really help much. Your uncertainty, whatever it is, never really goes away.

Restating probabilities as degrees of belief doesn’t escape any of these consequences, which is why I bring them up. If any of this is confusing (and I fully confess it may be to most beginners), it is the sort of thing I am available to discuss extensively and answer all questions about in my online course on Critical Thinking. Indeed, for that this article will be a required reading, so you are already ahead of the curve. Of course in that course I spend equal time on two other necessary domains: the cognitive science of human error, and traditional logic and fallacy-detection. But knowing some of the basics of how probability works is crucial to all other critical thinking skills—not least because most claims in pseudoscience, politics, society, and everywhere else are made using “statistical” assertions, and you need to know how to question those, or use them correctly (I give many examples and lessons on that point in particular in Dumb Vegan Propaganda: A Lesson in Critical Thinking and Return of the Sex Police: A Renewed Abuse of Science to Outlaw Porn, and in other articles further referenced therein).

Bayesian Probability: You might often hear about how “Bayesian” probability is “different” from and somehow “opposed” to “Frequentist” probability. This is all misleadingly framed. What is really meany by “Bayesian” probability in this comparison is the “Degrees of Belief” definition I just discussed, which I just showed is again just another frequency of something. And in actual practice, most Bayesian reasoning doesn’t even use that subjective model of probability but uses straightforward measured frequencies.

Indeed, in actual practice, even the so-called “subjective” model is just an estimation of the objective model in the absence of concrete access to it—so these are not even fundamentally different. Subjective modeling is what all humans do almost all the time, because they have to. We almost never have access to the scale and quality of data backing so-called objective models. So we have to guess at them as best we can. We have to do this. Most human beliefs and decisions require it—in daily life, in history, in romance, in politics and economics, in every aspect of our existence, precisely for want of better information, which we almost never have. So there is no point in complaining about this. We are in fact solving the problem with it, thereby preventing ignorance and paralysis, by getting a better grip on how to deal with all this unavoidable uncertainty.

But I cover the subject of Bayesian reasoning well enough already in other articles (for example, two good places to start on that are What Is Bayes’ Theorem & How Do You Use It? and Hypothesis: Only Those Who Don’t Really Understand Bayesianism Are Against It). Here I’ll move on to more general things about probability.

Error

Science has found a large number of ways people fail at probability reasoning and what it takes to make them better at it. It’s always helpful to know how you might commonly err in making probability judgments, and how others might be, too, in their attempts to convince you of some fact or other (even experts). See, for example, my article Critical Thinking as a Function of Math Literacy. A lot of cognitive biases innate to all human beings are forms of probability error (from “frequency illusion,” “regression bias,” “optimism bias,” and the “gambler’s fallacy” and “Berkson’s paradox,” to conjunction fallacies, including the subadditivity effect, and various ambiguity effects and outright probability neglect).

Key is realizing how often you are actually making and relying on probability judgments. We are often not aware we are doing that because we don’t usually use words like “probability” or “odds” or “frequency” or such terms. We instead talk about something being “usual” or “weird” or “exceptional” or “strange” or “commonplace” or “normal” or “expected” or “unexpected” or “plausible” or “implausible” and so on. These are all statements of probability. Once you realize that, you can start to question what the underlying probability assumptions within them are, and whether they are sound. In other words, you can stop hiding these assumptions, and instead examine them, criticize them. Hence, critically think them.

For example, what does “normal” actually mean? Think about it. What do you mean when you use the word. How frequent must something be (hence what must its probability be) to count as “normal” in your use of the term? And does the answer vary by subject? For example, do you mean something different by “normal” in different contexts? And do other people who use the word “normal” mean something different than you do? Might that cause confusion? Almost certainly, given that we aren’t programmed at the factory, so each of us won’t be calibrating a word like “normal” to exactly the same frequency—some people would count as “normal” a thing that occurs 9 out of 10 times, while others would require it to be more frequent than that to count as “normal.” You yourself might count as “normal” a thing that occurs 9 out of 10 times in one context, but require it to be more frequent than that to count as “normal” in another context. And you might hedge from time to time on how low the frequency can be and still count as “normal.” Is 8 out of 10 times enough? What about 6 out of 10? And yet there is an enormous difference between 6 out of 10 and 9 out of 10, or even 99 out of 100 for that matter—yet you or others might at one time or another use the word “normal” for all of those frequencies. That can lead to all manner of logical and communication errors. Especially if you start to assume something that happens 6 out of 10 times is happening 99 out of 100 times because both frequencies are referred to as “normal” (or “usual” or “expected” or “typical” or “common” etc.).

Many social prejudices, for example, derive from something like that latter error, e.g. taking something, x, that’s true only slightly more often for one group than another, and then reasoning as though x is true of almost all members of the first group and almost no members of the second group. For example, it is “normal” that women are shorter than men, but in fact that isn’t very useful in predicting whether the next woman you meet will be shorter than the next man you meet, because regardless of the averages quite a lot of women are taller than men. And often prejudices are based on variances even smaller than that.

An example of what I mean appears in the infamous Bell Curve study, which really only found a 5 point difference in IQ scores for black and white populations (when controlling for factors like wealth etc.). This was then touted as proving black people are dumber than white people—when in fact the margin of error alone for most IQ tests is greater than 5 points. So in fact, even were that difference real (and there are reasons to suspect it actually wasn’t, but that’s a different point), it is so small as to be wholly irrelevant in practice. To say someone with an IQ of 135 is “dumber” than someone with an IQ of 140 is vacuous. It means nothing, not only because both are exceptionally intelligent, but more importantly because there would be effectively no observable difference in their intelligence in daily life, nor any meaningful difference in career success, task success, learning ability, or anything else that matters. Thus to say “white people normally have higher IQs than black people” based on white people having on average 5 more IQ points (merely purported to be creditable to genetic differences) will too easily mislead people into thinking there is a significant difference in IQs being stated, when in fact the difference is insignificant, smaller even than the margin of error in IQ tests. In fact, this is so small a variance, that you will be 5 points dumber than even yourself just by taking two IQ tests on different days of the week!

Thus, we need to examine the probability assumptions behind such common judgment words as “normal” or “usual” or “rare” or “expected” or “bizarre” or “implausible” and so on. Because all those words (and more) conceal probability judgments. And those judgments require critical examination to be trusted, or even useful.

Logic

As a philosopher, one of the greatest benefits I have ever gotten from standard syllogistic reasoning is learning how hard it is to model a reasoning process with it. The mere attempt to do it reveals all manner of complex assumptions in your reasoning that you weren’t aware of. Reasoning out why you believe something often seems simple. Until you actually try to map it out. And you can learn a lot from that.

This is where trying to model your reasoning with deductive syllogisms can be educational in more ways than one. Most especially by revealing why this almost never can be done. Because most arguments are not deductive but inductive, and as such are arguments about probability, which requires a probability calculus. Standard deductive syllogisms represent reality as binary, True or False. It can function only with two probabilities, 0 and 1. But almost nothing ever is known to those probabilities, or ever could be. And you can’t just “fix” this in some seemingly obvious way, like assigning a probability to every premise and multiplying them together to get the probability of the conclusion, as that creates the paradox of “dwindling probabilities”: the more premises you add (which means, the more evidence you add), the less probable the conclusion becomes. And our intuition that there must be something invalid about that, can itself be proved valid.

When we try to “fix” this by generating a logically valid way to deal with premises only known to a probability of being true, to get the actual probability of a conclusion being then true, what we end up with is Bayes’ Theorem—because that’s exactly what Thomas Bayes found out when he tried to do this. And I find that no one has since found any other route to fix deductive reasoning into a working inductive method. As I explain in Proving History, every other attempt is either logically invalid, or simply reduces to Bayes’ Theorem—which is essentially a “syllogism” for deriving a final probability from three input probabilities (by analogy the “premises” or premise-sets). But I won’t go into that here; see links above. Regardless of what you think the way is to “fix” deductive logic to work with almost the entirety of human reality, which is inductive, you still must come up with a way to do it; and that means one that is logically valid and sound. Otherwise, 99.9999% of the time, deductive logic is useless.

So to be a good critical thinker you need to recognize how obsolete the deductive syllogism actually is. This tool of analytical reasoning was invented in Greece over two thousand years ago as the first attempt to build a software patch for the brain’s inborn failure to reason well. It is a way to use some of the abilities of your higher cognitive functions to run a “test” on the conclusions of its lower cognitive functions (a distinction known as Type 1 and Type 2 reasoning). But it has a lot of limitations and is actually quite primitive, compared to what we can do today.

It is notable, therefore, that this primitive tool is so popular among Christian apologists. They rarely-to-never employ the more advanced tools we now have (and even when they do, they botch them). Take note. Because relying on standard deductive syllogistic reasoning corners you into a host of problems as a critical thinker. Hence over-reliance on this tool actually makes it harder, not easier, to be a good critical thinker. Nevertheless, standard syllogistic reasoning has its uses—often as a way to sketch out how to move on to a more developed tool when you need it; but also as a way to map out how your (or someone else’s) brain is actually arriving at a conclusion (or isn’t), so as to detect the ways that that reasoning process could be going wrong. And also, of course, like all the reasoning skills we should learn and make use of, when used correctly and skillfully, standard syllogistic reasoning can be a handy tool for analyzing information, because it can bypass interfering assumptions and faulty intuitions, as well as (when used properly) expose them.

Conclusion

I am re-launching my one-month online course on Critical Thinking this September 1. Follow that link for description and registration. I am also offering any of seven other courses next month as well, and each one also involves critical thinking skills in some particular way. But what makes my course on Critical Thinking distinctive is that though I do cover standard syllogistic logic (and formal and informal fallacies and the like), I focus two thirds of the course on two other things absolutely vital to good critical thinking in the 21st century: cognitive biases (which affect us all, so we all need skills to evade their control over us); and Bayesian reasoning—which is just a fancy way of saying “probabilistic reasoning.” Which I’ve discovered is so crucial to understand properly, that I now think no critical thinking toolkit can function without it.

I’ve here given a lot of starter-advice on that. But it all extends from the three central guidelines you should learn to master and always follow: (1) of any claim or belief, learn how to seriously ask and seriously answer the question, “How would I know if I was wrong?”; (2) test that question by attempting to prove yourself wrong (and that means in a way that would actually work), not by attempting to prove yourself right; and (3) stop tackling straw men and weak opposition, and seriously engage with strong, competent critics of your claim or belief, with an aim not to “rebut” them but to understand why they believe what they do. These three principles carried out together will become a powerful tool drawing your belief system closer to reality and reliability. But all three require accepting and understanding how probability works in all your judgments. And in anyone else’s.

Discover more from Richard Carrier Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading