Statistical Stylometrics: The Good, the Bad, and the Horrid (Part 1: Paul)

22 May 2025

Stylometrics (or “stylometry”) is the study of authorial style. To determine common authorship (or conversely, forgery) it looks at obvious things like preferred idioms, grammar, vocabulary, and valencing (how words are used, what they mean), and things harder or even impossible for an author to keep track of (like average sentence length, or the frequency of using certain words or combinations of words, especially words so routine an author rarely pays attention to them, like pronouns, articles, and particles). This can be done quantitatively or qualitatively, in other words, statistically or nonstatistically. And of course averages and frequencies are usually sussed statistically (but not always) while preference trends are usually sussed nonstatistically (but not always). And as with every method (from standard logic to Bayesian), it can be grifted. Countless bogus stylometric studies exist, and usually the more complex and obscure the method, the more likely it’s a con. So you have to be extremely wary of these things. It helps to be able to tell when something is being gamed, and when it’s legit.

The most common method is nonstat. For examples of this, see Bart Ehrman’s Forgery and Counter-Forgery and my discussion of the nonmathematical stylometrics of the long ending of Mark in Hitler Homer Bible Christ (pp. 249–59); likewise the whole literature on the nonmathematical stylometry of the Testimonium Flavianum in Josephus (see Josephus on Jesus? Why You Can’t Cite Opinions Before 2014). A new academic attempt to “rehabilitate” the Testimonium Flavianum on that score has just recently come out that I will critique separately. It doesn’t use math. But today I want to cover some examples of the mathematical kind so you will have some idea of what to look for in evaluating them. I will cover one bad example, one good example, and one example that’s absolute garbage (spoiler: I’m talking about Christ Before Jesus).

I will hereafter call nonmathematical stylometry “standard” and the other kind “statistical.” And I will run this survey in two parts: today I’ll discuss the most recent statistical stylometrics on Paul’s Epistles; and in Part 2 I will discuss some recent weird attempts to do this for the works of Galen and Pliny the Younger’s letter on the Christians, which make for good cautionary tales.

Example: The Shakespeare Case

Bayesian stylometry has been used to try and solve the “dispute” over who wrote the works of Shakespeare. But I have yet to see an honest example. Usually a study is rigged to avoid controls or even an ability to get an undesired result. If you catch someone saying “Shakespeare used this word one time; Edward used it all the time; therefore Edward wrote Shakespeare,” they are conning you. That observation actually argues against Edward being the author—and it just takes a minute to figure that out. So you have to attune yourself to sussing out what someone is actually claiming (buried under the technical jargon and convoluted descriptions or weird language), and then apply a standard logical vetting to it (just like anything else).

That includes remembering to look for things a study is leaving out. Because the most effective way to lie is to force the reader into “your” rigged frame so they don’t stop to ask if they are being framed. For example, if a study purports to get a final probability of Shakespeare’s authorship, yet claims there is no contemporary evidence of Shakespeare’s authorship to consider (and therefore they never include any in their determination of that probability), they are lying. And their results are thus useless. But if they claim only to be determining a single data point (the probability that Shakespeare wrote his opus only given stylometry), and admit additional data could change the overall probability that Shakespeare wrote it, you might be looking at more honest operators (that depends on the merits of their remaining methods).

Bayes’ Theorem is just like Aristotelian logic: you can use it to invent any conclusion you want, if you hide the fact that your premises are bogus and your datasets rigged, incomplete, or incorrectly framed. Think, Kalam Cosmological Argument. 100% valid logic. 100% bullshit. By sneaking in bogus premises and hidden inferences, a sound method is used to create the fake appearance of a real result. No one thinks “logic” is therefore useless or a sham and should be abandoned. To the contrary, it is to be used more. But we have to keep our eye out for when it is being used disingenuously to create the fake appearance of a legit result. Likewise anything in probabilistic logic (See Crank Bayesians: Swinburne & Unwin and Crank Bayesianism: William Lane Craig Edition, or Daniel Bonevac’s Bayesian Argument for Miracles, and contrast those with, e.g., A Test of Bayesian History: Efraim Wallach on Old Testament Studies or A Bayesian Analysis of Susannah Rees’s Ishtar-in-the-Manosphere Thesis.)

Example: Paul’s Epistles

Ehrman’s Forgery and Counter-Forgery is the best survey of this question to date. Generally six of Paul’s Epistles have been agreed on standard stylometry to be authentic. A seventh, Philemon, has always been recognized to be too short and off-topic for that conclusion to be secure, though it tended to be thrown in as “also authentic” for want of a contrary case, but a recent study challenges that conclusion (see Did Paul Write Philemon?). But the others are obvious forgeries (for context see How Do We Know the Apostle Paul Wrote His Epistles in the 50s A.D.?). The Deutero-Paulines (Colossians and Ephesians) probably share common authorship with each other but deviate from Paul with a more flowery and almost ridiculous and simplistic style, and use concepts and words too differently to even have been anything he would endorse, even though they are in his school of thought. They could date to the late first century, as they seem unaware of the Gospels yet are unlikely to have gained purchase while Paul was alive. The Pastorals (1 & 2 Timothy and Titus) also probably share common authorship to each other, and deviate from all the above in a more terse and technical style, again using concepts and words too differently. They “sound like” early second century Christian discourse—in fact their style resembles that of Luke-Acts enough to lead some scholars to suspect they may share the same or similar author. 2 Thessalonians was deliberately written to sound like Paul but they still made enough mistakes to give it away. Hebrews does not claim to be by Paul, but its style also is not Pauline, though it resembles Paul more than the others. It is likely by a student of his (either literally or conceptually), since it comes from his sect and likely dates to within years of his death (just like 1 Clement: see How We Can Know 1 Clement Was Actually Written in the 60s AD).

I should caution against a recent apologetic that these are all “authentic” because Paul let someone else write them (for the most competent attempt at this see Paul and First-Century Letter Writing: Secretaries, Composition and Collection by E. Randolph Richards), based on the idea that Paul names co-authors in some of them. But he doesn’t really. This idea misunderstands what a greeting is, and what function it serves in these administrative homilies. When Paul names someone else in them, he never says they are co-authors. They are just with him and asking to send their regards in his letter. When Paul writes “Paul … and our brother Sosthenes” send “grace and peace” to “the church of God in Corinth” (1 Cor. 1:1–3) or “Paul … and Timothy our brother” send “grace and peace” to “the church of God in Corinth” (and “all his holy people throughout Achaia,” 2 Corinthians 1:1–2) or “Paul, Silas and Timothy” send “grace and peace” to “the church of the Thessalonians” (1 Thess. 1:1) or “Paul and Timothy … to all God’s holy people … at Philippi” (“together with the overseers and deacons,” Philippians 1:1), all he is indicating is that those named people want to be remembered to the recipients.

Hence when Paul wrote to the Galatians he says “Paul …and all the brothers with me” send “grace and peace” to “the churches in Galatia” (Gal. 1:1–3). The exact same formula. And yet, obviously “all the brothers” with him did not “co-write” that letter, or even scribe it. They are just with Paul when he wrote and asked to be included in his greeting (just as we see for Timothy and several others in Romans 16:21–22). So all those other named persons may have had no more involvement in the matter than that. And even if they did, it will have been in a different capacity: either (1) as a scribe (like Tertius in Romans 16:21–22), who would only be taking dictation, they would not put words in Paul’s mouth, and certainly wouldn’t construct elaborate rhetoric on his behalf (nor would Paul trust them to); or (2) as the messenger delivering the letter (as can be inferred for Silas or Timothy in the case of 1 Thess. 3). In other words, the second or third named person may be the person actually delivering the letter, and thus the greeting serves as a kind of authentification or introduction (proving the letter is really from Paul or that Paul endorses its messenger as legit). Or, again, they may simply be the one who penned what Paul said (as we can see was happening in Gal. 6:11), not speaking for him. Calligraphy was laborious, hard on the hands, and its own skill, so authors rarely scribed their own letters.

Of course, regardless of all that, the presence of named companions on four of the six authentic letters did not affect their stylistic consistency. So this theory is already a non-starter. But on top of that, the suspect letters’ deviations are not just in style, but theology and beliefs, which could not be explained by co-authors at all. Paul would never have endorsed the Pastorals, for example, as they argue contrary to his own closely-held positions (e.g. their misogyny is rabidly anti-Pauline; Paul himself was more egalitarian and took an entirely contrary stance). And he would not likely put his name on the Deuteropaulines because their rhetoric and language is too “first-year” for him to hang his reputation on.

The Savoy Study

There are two recent legit stylometric studies of the Pauline Epistles. The worse of those is by Jacques Savoy, “Authorship of Pauline Epistles Revisited,” published in the Journal of the Association for Information Science and Technology 70 (2019). He used a mix of computerized methods (in a way that looks more like fishing for a result), which produced weaker and more inconsistent results than his wording suggests. He seems intent on proving a ten-letter hypothesis, and skews his methods and results as much as possible toward that result, which renders the whole study less objective.

Even his carefully worded conclusion captures this if you pay attention:

Even if this study is unable to reveal the true author of all epistles, we were able to clearly identify three groups. The first homogeneous stylistic cluster regroups Romans, 1 and 2 Corinthians, and Galatians. … In the second group, one can find two letters (Colossians and Ephesians) probably authored by the same person. The third cluster corresponds to 1 and 2 Thessalonians, with an indication that both letters might have been written by the same author. These three clusters could have been written by a unique author, or by two or three distinct persons. …

However, the verification results indicate a clear link between 2 Corinthians and 1 Thessalonians but the latter does not have a clear stylistic relationship with Romans, 1 Corinthians, and Galatians. Similarly, the Letter to the Philippians has some relationship with the Galatians, but not with the other three. …

The Pastorals were nowhere near in his results (“it seems that the pastoral letters have been authored by a distinct person”), so he didn’t even address them further. And yet even his statements on the rest are over-confident, because when you look at his actual numbers and graphs (e.g. pp. 8, 9), his method was finding weird results, like a close match for 1 Peter as Pauline, which of course is absurd (not only because 1 Peter does not even purport to be by Paul, but also because its style is even at-sight obviously wildly different from anything in Paul). This suggests Savoy is not properly testing his results against controls, and thus is over-confident in his method’s ability to even do what he wants.

Hence his outcome diagram (on p. 9) is a bit of a mess:

The principal clue that something is wrong here is when you look at how Savoy is translating mathematical results into an English description of what’s going on. He is making bold assertions about results that are actually such close calls that they might be inside any realistic margin of error. Which means his results are much weaker than he frames them.

For example, look at his table on p. 7:

Here, by aggregating counting the frequency of tokens (like articles and conjunctions), he finds Romans and 2 Corinthians have a statistical “distance” of 0.2931, while for Philippians and Colossians it’s 0.2999, a difference of a mere hundredth and a half, while 1 and 2 Timothy rate a “distance” to each other of 0.2890, almost twice as much apart. This indicates his method cannot actually do what he wants. And if you read the study carefully, you will catch an admission of all this (“the results of the previous experiments do not present clear attributions,” p. 9, a stealthy way of saying the entire first half of his paper is useless and its results should be ignored).

So Savoy changes tack (and lurches for some other method). And here the problem may be that Savoy is over-excited by the character-count n-gram method, whereby one simply counts the frequency of fixed sequences of letters wholly apart from any syntax, which has not been properly tested on ancient Greek and is unlikely to work very well on it, because unlike the modern languages this method has been well tested on (see Keselj et al.), ancient Greek lacks consistent word order, and is heavily inflected (verbs and even nouns and adjectives change form radically based on gender and function). Katarina Laken, whose study we will examine next, adds as well that:

In Greek, dependencies can stand relatively far away from their heads compared to English dependencies, because the inflection often clarifies the relation between the two. Therefore, units that belong together will often not be captured in the token n-grams.

And certainly not in character n-grams. In other words, since character-counting will miss almost all the actual structure in ancient Greek due to all these factors that distinguish it from modern languages, Savoy’s approach is not likely to work. (Put a pin in that. Because this will come up in my next article.) I am also worried that Savoy implies that he ran his test on (an unstated) English translation, not the Greek (p. 10), which at worst tanks his entire result but at best signals he is not doing a good job of even describing his method so we can vet it. But assuming Savoy didn’t make that mistake, the “fuzziness” of his results may be due to the inadequacy of this method for the purpose.

And yet even with all those caveats, his character n-gram application still does “see” that “Colossians and Ephesians” are their own thing, while the “Pastorals” are even more so, and that “Romans, 1 and 2 Corinthians, and Galatians” are their own thing. Questions only persist regarding Philippians, and 1 and 2 Thessalonians. Per his concluding diagram (on p. 10):

Savoy’s method does show (an albeit weaker) connection between both Philippians and 1 Thessalonians with the Romans cluster, and that 2 Thess. is more distant than 1 Thess. Which supports the mainstream consensus. The problem here is that 2 Thessalonians is a deliberate emulation of 1 Thessalonians (as discussed by Ehrman), and Savoy’s method might not be able to tell when one letter is directly based on another and attempting to look like it. This is the problem with just “counting strings of letters” without regard for syntax. We’ll see shortly how to get around that.

So the Savoy study is professional and sincere, but it is pretty lousy. Its description of methodology and identification of data is poor. Its assumptions are sometimes dubious, and methods questionable. Savoy is overstating what his results are actually saying mathematically. And he even “venue shops” methodologies after admitting the one he first tried doesn’t work all that well (but doesn’t get around to admitting that neither does the other). But even when we wade through all that, his results still support the standard mainstream consensus: Pastorals are by a different author; Deuteros are by yet another author; and 2 Thessalonians is a riff on 1 Thessalonians. The only challenging result is its divergent finding for Philippians (which we will revisit).

The Laken Study

The very best study of late is a Radbound University bachelor’s thesis in linguistics by Katarina Laken, An Authorship Study on the Letters of Saint Paul (2018). Savoy did not cite this, but that could be because he didn’t notice it (theses get databased differently) or because of the inordinately long publication pipeline in humanities and soft sciences (where it can take a year or more for a submitted article to get through peer review and published). Still, Laken published before Savoy and thus could not have been influenced by him. Even more importantly, Laken doesn’t jigger her methods to get a result she wants, and her method is simple and well designed, so it has low risks of confoundment and other correlation fallacies.

Laken also uses an n-gram method, but she knows that character-count won’t work so well on ancient Greek and thus also deploys syntactic-lexical n-gram techniques (and indeed finds those much more reliable), which is already promising compared to Savoy. And Laken checks various methods against each other, but unlike Savoy, who employed multiple methods seemingly in an attempt to validate his preferred thesis (and getting dissatisfying results he had to spin), Laken has no preferred thesis and is really only interested in finding out the answer, so she chooses several methods to corroborate the results of each. And unlike Savoy she does a better job of explaining her methods, data, and the connection between them and her stated results. She’s also funny (“I will disregard any possible divine interventions and assume that the Holy Spirit, whether it inspired the letters or not, has no distinctive style that could interfere with my results,” p. 4). And she knows what she’s talking about (she is aware that 2 Thessalonians was attempting to mimic 1 Thessalonians and therefore that her method needs to take that into account, and she offers smart suggestions as to how to do that). These are all good signs.

Her central result confirms the mainstream consensus (with Philemon and Titus excluded for their brevity, and with the longest letters broken into halves or quarters, hence “Heb1” and “Heb2” are the first and second halves of Hebrews):

This kills Savoy’s ten-epistle hypothesis (Colossians, Ephesians, and 2 Thessalonians are clearly non-Pauline here). Yet, in agreement with Savoy’s shakier findings, 1 Thessalonians and Philippians do appear to be the most divergent in syntax, but still below the inclusion line, and are well within the general features comparison, while 2 Thessalonians is wildly outside it. The Pastorals and Deuteros fail on both metrics (well outside the zero-line in both).

So we see why Savoy’s fuzzier method got an ambiguous finding for Philippians: it is the most different of Paul’s letters, but it nevertheless strongly matches his style in general features (better even than half the core Paulines do) and still trends Pauline even on syntax: still below midline and very close to 1 Thessalonians—as we should expect, for those two letters share a greater emphasis on Paul’s experiences of persecution, and more personal matters and fewer theological disputes, than any of the other Epistles, almost making them a different genre. Compare that with Hebrews, which we see orbiting the Paulines here, placing it among the most Pauline non-Paulines. Yet it is definitely far orbit—as in, it isn’t in the middle but on the outside of the Pauline grouping, by one or another metric (Heb1 is exactly midline on syntax and Heb2 is approaching it on general features), so it is credibly not by Paul but could well be by a student (literally or conceptually).

Illustrating that is the fact that the most Pauline of all the non-Paulines is…James. Which does not even purport to be by Paul, nor is it even from his sect (James is pro-Torah). Yet that is also on the outer orbit, and yet closer (and by both metrics) than Hebrews. If we didn’t know better, we’d have classed this as authentically by Paul. So this shows how different authors can (rarely) be close in style, and yet we can still expect they will be on the outer fringes of that metric, so we can still credit them as different people when we do know better. This only amplifies the results for the six forgeries, which are hell and gone from being that similar to Paul on this plot, and thus decisively forged. The fact that 2 Thessalonians only falls in-range (and only slightly) on one of the two metrics on this graph (while being wildly out on the other) confirms the thesis that it’s author was not Paul but was trying to sound like Paul: hence the syntantical agreement is high (1 and 2 Thess. are close on that axis) but their general-features agreement is extremely low (they could hardly be father apart on that axis).

Laken runs a bunch of other plots that show various techniques incapable of predicting authorship at all (some similar to the fuzziness of Savoy), and then discusses the difference between the metrics that worked, and those that didn’t. She gives examples, as well, of distinctions between the texts that might not relate to authorial style. For example, that Romans uses a lot of “greetings” is a feature of that letter (in ch. 16), which may have been edited out of other letters (and only the first one in the dossier kept) or distinctive of Romans for being the only community Paul had not yet visited (but clearly knew members of), so that would not be a good feature to emphasize in comparing authorship. But since Laken ran hundreds of features for a statistical aggregate, these kinds of things will have been washed out by the much larger number of other features that do relate to style. For example, her program identified the use of alla, “but,” as distinctive—and not in its frequency of use, but the syntax and semantics of its use (i.e. when Paul chooses to use that word is distinctive).

In the end Laken finds that the most useful metrics are the ones hardest for forgers to emulate (or even know they have to), like preferences regarding the use of particles like de in Greek that have extremely variable meaning (they function more like punctuation than as words per se), or what Laken calls “vocabulary richness,” which relates to an author’s repertoire (how many words they know well and thus are ready to employ). This is one of many features where 2 Thessalonians gives itself away: by its author not knowing as many words (or not being as comfortable with using as many words) as Paul. This could also be why James sounds so Pauline: its author may have a richer vocabulary, creating more opportunities for overlapping metrics. One could expand on Laken’s work by looking into what actually drives the plotting of James close to Paul so as to extract markers that distinguish James from Paul, and then testing the other letters for those Pauline features.

Laken herself concludes with some good advice for future study. For example, that these models will improve if researchers systematically remove all quotations of other sources from these texts first. In Paul that means mainly quotations—though not his own paraphrases—of scripture, but also sectarian material (like the hymns and creeds) suspected of predating him (or deriving from “visions” and thus arriving in a different voice). Laken also hints at ways of revising the n-gram method to better fit ancient inflected languages by not relying on strict adjacency, but building n-grams instead using logical rather than physical adjacency. For example, in Greek and Latin it is possible to put a noun at the start of the sentence and five or eight words later place the adjective modifying that noun. This is precisely the kind of weird thing the n-gram method misses because it’s built for modern languages where that would never happen. Likewise, “Greek tends to have longer words than English,” so a character n-gram length should be increased. And so on.

The Britt-Wingo “Study”

Which brings us to the garbage-dump that is the amateur (and probably crank if not outright fraudulent) “study” by Matthew Britt and Jaaron Wingo in Christ Before Jesus: Evidence for the Second-Century Origins of Jesus. This is not a peer-reviewed study (unlike Savoy or Laken or any of the predecessors they in turn cite), but a crappy amateur slog that I suspect is self-published (because its publisher, Cooper & Samuels, does not exist, and publishes no other book than this). This one is epically bad—like, the worst—to the point of leaving me suspecting it might actually be a grift, an attempt to glean money out of the mythicist market with deliberate bullshit meant to look impressive but that is actually bogus (and at minimal expense: just look at the travesty that is their website; or if that changes after I publish this: just look at the travesty that was their website). As such it reminds me a lot of James Valliant’s Bogus Theory of a Roman Invention of Christianity.

As just the first piece of evidence for that, their book’s description uses the trick wording of “Christ Before Jesus presents a first-of-its-kind analysis using proven, peer-reviewed mathematics and software which reveals a second-century origin for the books of the New Testament.” Did you catch the trick? They did not say their study was peer reviewed. They said they used a “method” that was. Which looks like trying to mislead people into mistaking them as having said their study was peer reviewed (it wasn’t; nor would it ever have passed review in the form presented). So this whole enterprise looks shady and dishonest to me. Or else batshit crazy, but unless this is folie à deux or these two guys don’t exist (and it’s just one guy), a con is more likely. (For less shady and bullshitty defenses of their thesis, see Was the Entire New Testament Forged in the Second Century?)

Their stylometric chapters are so poorly written and their discussion of methodology so obscurantist I did not waste time vetting it all. It is almost all rambling speculation anyway, so you have to skim to find the scant few actual discussions of their stylometry. But I examined enough to form the following opinions.

Its authors do not appear to have any relevant credentials. And the book has no academic quality. It is written like pop-market ghee-whizz with minimal footnotes or legitimately formulated procedures. It states formulas, but never shows or samples what raw inputs they gave them, or where they got those inputs from, or how any of their raw data were generated, or even what their actual mathematical conclusions were (their graphs all lack numbers—that’s right: there are no “metrics” in their stylometrics). There is simply nothing “we can check” (despite their claiming we can). They also never explain how any of this relates to chronology (the “peer reviewed” method they claim to be “using” does not answer that).

Statistical stylometry is an extremely difficult and error-prone methodology (precisely because of its complexity), so it really needs peer review. This is one of the reasons I suspect Laken’s study was so much better than Savoy’s—and why Savoy’s is so much better than whatever this is. Laken had named peer reviewers as actual advisers who contributed a lot of time to ensuring the quality of her approach and output, because their reputation was on the line, thus ensuring a high quality study (at least by undergraduate standards). Savoy probably improved under his reviewers, too, but they won’t have devoted that much time to it (and clearly didn’t, as there are defects in his study that are a bit sloppy for even casual peer reviewers to miss). By contrast, Britt and Wingo act like coked-up goats in a china shop, with no guardrails, supervision, or controls.

Despite their verbal claims, when I look at what little they offer in math and graphs (as opposed to bluster), I cannot discern what they are actually doing or why it is valid. They refer to and loosely describe valid methodologies, but they never give an example of how specifically they are applying those methods to the Greek text, they just “claim they did.” Indeed, I cannot discern how they ran their alleged math on the actual Greek text. They never even say what their source was for any Greek or Latin text or how it was tagged (contrast Laken’s detailed attention to this point). They just show a bunch of nonmathematized “distance trees,” but not what parameters they used to generate those (and without the numbers, their reliability or even meaning cannot be checked).

Unlike Laken, whose entire study can be reproduced by what she provides (and to some extent also Savoy, though with more difficulty), Britt and Wingo essentially render this impossible. I have no idea what they actually did or on what specific data. What words did they count? And how did they demarcate “words” in inflected Greek? The only example they give of their work is done on an English translation, not the actual Greek. They claim they used the Greek for everything they gave no examples for, but how am I to believe that? Because at no point did I get any impression they even know Greek. Of course if all they really used was English, then they are actually running stylo on modern translating committees, not Paul. Which completely invalidates their results. English will largely conceal all the stylometrics of the original Greek, and replace it with the stylistic preferences of modern translators (as I noted already, but for more context see From Homer to Frontinus: Biased Translation Is Not Unique to Biblical Studies).

This is why it is a problem when I encounter them not even knowing how to tag or parse Greek. So how could they have run any pertinent n-grams on it? They are never clear whether or when or how they ran mere character n-grams or of what length (the problem with which I’ve already noted), or any other kind of n-gram (they never say what kind they used to generate any of their graphs, or what parameters they set), or how the results differed for different settings (contrast Laken’s approach). Which is enough to make this amateur hour. They also engage dubious rhetoric like boasting of a 96% “accuracy” without explaining what that is measuring (96% of “what”?).

Which is why their seeming ignorance of Greek especially bothers me. For example, they claim that Luke 8:11-15 does not use the word “he.” But in Greek the word “he” is implicit in the inflection of a verb. Greek does not require the explicit pronoun. Moreover, it can represent that pronoun with many more words than in English, including the plural. Because, remember, the word “he” is just the singular of “they.” And what about “him” and “his,” and thus “them” and “their”? It’s all the same word in Greek, just inflected; just as “is” is the same word as “are,” and they seem to be aware that “is” and “are” are the same word. But they don’t provide this information for the pronoun. So what do they mean when they say Luke does not use the word “he” there? What are they counting? I can extract many instances of “he” across those verses, depending on what I count. So what are they counting? I can’t even tell if they know what they are doing here, much less whether it’s correct.

…and whether it is correct is a deeply open question.

For example, at one point they are comparing versions of a saying between Matthew and Luke, where Matthew says “his heart” and Luke says “their heart,” differing only in whether it’s singular or plural. So, is this a stylistic choice? Does Luke “use the plural” more often than Matthew? Or the other way around? Or neither? They never check so as to say. This is also the same word as “her” and “it,” differing only in inflection. So how are they telling these apart in Greek? Especially since sometimes these share the same form in Greek. For example, “it” and “him” in the accusative case are identical orthographically in the Greek. And because Greek is gendered, sometimes “he” literally means “it.” For example logos, “word,” is a masculine word and thus is literally referred to by Matthew with the masculine singular pronoun “he” (which only gets translated into English as “it,” because we gender words differently). So are they counting that? Or not counting that? Should they or shouldn’t they? You’ll never find out.

Another example of this suspicious behavior is that they claim a stylometric fit establishing that Irenaeus wrote (our) Luke-Acts, but show no graph illustrating this, and since Irenaeus survives only in a medieval Latin translation (and thus can’t be stylo’d in Greek), that can’t get the result they claim. I had at first thought they didn’t even know this, but I see they do admit to knowing it and using a “modern” Greek back-translation, but as that can only track the style of its modern translator it’s not capable of doing what they want (they also don’t explain, again, what digital text they used or how they tagged it, or what parameters they ran on it).

Instead of explaining any of this, they say strange things like that Matthew uses “he” ten times in 13:18-23—but that isn’t true if they are counting the actual pronoun (and not verbal inflections, different words like relative or demonstrative pronouns, or definite articles as substitutions for these, a common practice in Greek). Hence in the NIV, the word “he” appears zero times there. Not “ten.” On a hunch I looked into whether they are just running their numbers on the King James translation, and, well. Yeah. But being as charitable as I can be, and assuming they meant to include “his” (hence “his heart,” though including that would make the count by the KJV more than ten, so who knows WTF they are doing here), their claim seems to amount to saying that Matthew recasts the parable he inherited from Mark in the plural into the singular—while Luke maintains the plural. But their resulting graph shows Matthew closer to Mark than Luke, when, without explanation, it’s the other way around in the example they chose to discuss: Luke was more consistently faithful to Mark (or vice versa on their theory of Marcionite origins of the Gospel) when retaining the plural here (“they/theirs” instead of “he/his”). Luke (as often) does blend Matthew with Mark here, but then he still consistently renders what he gets from Matthew back into Mark’s plural. So Britt and Wingo are not making any coherent sense here. And never mind that this has literally fuck all to do with what century any of these books were written in, or in what order.

Even when they give anything resembling distance metrics (in a manner sort of vaguely like Savoy), they actually don’t say anything numerically, much less explain how they derived any of their relative measures (what n-gram settings did they use to generate any of the scant few graphs in the book?). So when they claim to plot the Greek, there is no way to know the sensitivity of their demarcations (once again, there are no “metrics” in their stylometrics; contrast Savoy, who did this correctly, allowing us to detect the problematic hairline sensitivity of his results). It all looks like just junk analysis to me. This is most evident when their graphs don’t even support their assertions. For example, they claim Matthew and Mark chronologically follow (what they call) “Marcion-Luke,” but no graph they present shows that, nor do they give any real evidence for this (stylometric or otherwise). Their method couldn’t do that anyway. All that statistical distance metrics can do is discern common authorship, not chronology or sequence. They never explain how, then, they get “sequence” out of it (other than by telling porkies like ‘Irenaeus shares enough Greek style with Luke-Acts to prove he wrote it’, a claim they never graph and is literally impossible). And “it” is already bullshit anyway. There is no credible stylometry here.

So I have to conclude these authors are either wildly naive and have no actual idea what they are doing, or they are trying to pretend they do and this is all just a con for sales. And though my suspicions lean toward the latter, it doesn’t matter. Because the results are garbage either way. So who cares? By contrast, Savoy can be critiqued because he is a professional who knows what he is doing, and presents enough of “what” he is doing that we can evaluate it. These jokers don’t even give us that. While Laken is in another galaxy altogether, pretty much giving us the gold standard for what we are supposed to be doing. And it corroborates the modern mainstream consensus about these letters.

Next, in Part 2, I’ll look at some horrid peer reviewed studies, albeit on more tangential subjects.

12 Comments

Mark Haslett on May 22, 2025 at 6:53 pm

Hi Dr. Carrier,

I have become a fan of your work and your use of Bayes theorem. It is funny to find a reference to the Shakespeare authorship question in your latest newsletter, because it is this question which drew me to your methods.

I read Proving History while I wondered if taking your course could help me pursue this question. I’ve been studying hard and reading widely to make myself an amateur expert of the period.

Though you correctly state that only a liar would say there is no evidence from the period to consider in this question of attribution, I find the case for a Stratford Shakespeare to be eerily similar to the case for a historic Jesus- the closer I look, the less basis I find to support the traditional claim.

Has anyone tried to pursue this with your guidance? It is so much more recent than the period you are expert in, so I was not sure it would be appropriate to bring it to you for guidance. What do you think?

Thanks for all your work!
Reply
- Richard Carrier on May 23, 2025 at 10:05 am
  
  The link I provided at the top of my section on that is to an expert who is an “agnostic” about it and discusses the matter correctly and in detail. I take no position on it. But if you are interested, that’s where you should start. But there definitely is way better evidence than for Jesus (my link on that point takes you to a summary of stuff we have for this that we don’t have for Jesus).
  Reply
Aro B on May 24, 2025 at 1:38 pm

On modern English just using unusual words and Burrows Delta seems to work well to identify writers (see e.g. https://antirez.com/news/150)

But I assume this is more difficult when writers try to intentionally mimic each other as in the Pauline forgeries. They are likely to mimic specific words too, especially if they write about somewhat common religious topics.
Reply
- Richard Carrier on May 25, 2025 at 10:21 am
  
  That’s what these authors are using. BD is just the idea (and the equations) for measuring “distance” using some metric frequency (you can apply BD to endless metrics, hence the difference between character n-grams and token or syntax n-grams). So distance tables (as shown here) are using BD or a variant of BD.
  
  The original test of Burrows Delta was on word frequencies, which is what is meany by “token” frequency. The trick with Latin and Greek is that a simple method won’t catch the tokens, since every word has a different form depending on its grammatical position, so you have to “tag” the words (go in and catch every form of a word and mark it as “the same” word), then you can run a BD on it. This is why tagging (and the ways it can be done improperly) is so crucial to running stylometry on these languages.
  
  Later the technique of just looking at character sequences was developed, which works well or even better on modern languages but will certainly work much worse on ancient languages for the same reason (too many characters in an ancient sentence are beyond an author’s choice or control because they are forced by the grammar and the rules for each individual word). And then the technique of expanding tokens (to include things other than mere words) and then the technique of running BD on syntax.
  
  The latter is a way of looking for word pairs, for example, and other structure, which has to be programmed differently in ancient languages because, unlike modern languages, physical adjacency is not a reliable marker of syntax as word order is “open” to stylistic decisions far, far more than in modern languages. Hence Laken’s proposal that a method be developed to look for logical rather than physical adjacency, or some combination of the two.
  Reply
hauslern on May 24, 2025 at 6:13 pm

Do you have an opinion on how the Mormons use this theory to argue for the plausability of the Book of Mormon?
https://exploringmormonism.com/bayesian-thinking-and-the-statistical-analysis-of-the-book-of-mormon/
Reply
- Richard Carrier on May 25, 2025 at 10:44 am
  
  Indeed. The linked article already says what one needs to say about that.
  
  What you get is the usual “bad inputs” using a valid method to get an invalid result. You have to use objective inputs. And ironically that is easy to do with stylometry (you don’t have to insert “gut feeling” numbers into the equation at any point), so it is especially funny to see fanatics ditch all the actual advantages of the method with a mere insertion of a faith-based probability. That turns it into a circular argument.
  
  I’d say the most notable error in this case is that they are misusing h (hypothesis) as a predictor of e (evidence): mistaking h as belief rather than a model, and thus deriving P(e|h) on a basis of what they “should believe” if they “believe h” rather than on a basis of what h, as a model, predicts apart from your mere “belief,” which is really just another way of saying b (since it’s P(e|h.b)) needs to contain knowledge (facts), not beliefs. Because if the latter are not evidence-based to begin with, then you are just engaging in a circular argument, and thus not allowing evidence to have any effect on belief.
  Reply
noahchriss on May 25, 2025 at 6:15 pm

I was hoping you’d get around to talking about Britt and Wingo sometime. They were on Mythvision Podcast a few months back (“Was Jesus Created In The Second Century?”). If you watch the episode you will probably get the same impression that I did, which is that they are amateurs who played around with some stylometric software and thought they found something earth-shaking.

I can’t dispute their amateurishness (no knowledge of Greek apparently) however one idea they put forth continues to nag at me, namely that the Lukan core (no birth narrative or post-resurrection appearances) was derived from the Gospel of Marcion (timestamps below). Note that Britt and Wingo seem to imply the Gospel of Marcion predated all the gospels, but that’s neither here nor there. In their scenario Luke edited the gospel, then added the birth narrative, the post-resurrection appearances, and Acts. When Luke says ‘many have undertaken to compile a narrative’ (Luke 1:1) then he would have meant Mathew, Mark, and the Gospel of Marcion. In a more complete model Mark would have been Judaized by Mathew, then Anti-Judaized by the author of the Gospel of Marcion, and then Luke came along and tried to harmonize everything and make it more palatable for Roman society. I think this theory is intriguing because I find the Lukan core to be compelling in a way that the hypothetical add-ons are not. Thoughts?

Discussions at:
17:58 discussion of priority of Gospel of Marcion before Gospel of Luke
26:16 further highlights
44:05 discussion of Stylo software
56:57 results from Stylo regarding Luke/Acts
1:26:00 further discussion
Reply
- Richard Carrier on May 26, 2025 at 9:58 am
  
  that the Lukan core (no birth narrative or post-resurrection appearances) was derived from the Gospel of Marcion (timestamps below).
  
  That is a mainstream (albeit very small minority) position. So it’s not silly to propose. They just don’t advance the debate at all because none of their methods are capable of doing that the way they used them. That’s why I linked to serious arguments for this conclusion as a contrast to their failure (see Was the Entire New Testament Forged in the Second Century?).
  
  We could use a proper stylometric study on this. But it has to be professional and essentially follow the model of Laken in clarity of describing exact methods, parameters, and data. It can’t solve the chronology problem (it can’t prove what came first, e.g. if Marcion worked over and abbreviated an already-existing text or if he composed the core of Luke and it was then expanded; it can’t prove that Mark or Matthew came after that; and so on). It will also have to take into account special problems confronting an internal Lukan analysis.
  
  For example:
  
  Luke contains a lot of Mark and even Matthew. So the chapters of Luke with “Mark” or “Matthew” in them will look different than chapters that don’t (like the Nativity or all of Acts). You have to account for that somehow in your methodology.
  
  Likewise editorial fatigue (Goodacre has shown authors fall out of their own style and back into the style of the author they are copying as they go, so the second halves of pericopes will look more like the original author and less like the reworking author than the first halves). So you have to account for that somehow in your methodology.
  
  Luke’s speeches follow the school style of variatio (compositional Greek taught you to vary style for different speakers when inventing speeches). And you can even see this in English (though it’s even clearer in Greek): Stephen’s speech in Acts uses a different style than Jesus’s speeches in Luke which uses a different style than the Magnificat in the Nativity (which is a high-style Septuagintal-song text unlike anything else attempted in Luke-Acts). These will all “look different” stylometrically because Luke is deliberately changing their style, and style always changes between genres (and speeches and narrative are different genres). So you have to account for that somehow in your methodology, too.
  
  There is a third version: the 20% longer Western text of Luke-Acts (a lot of people don’t know about this because it’s not in modern Bibles but it is in early manuscripts like Codex Bezae), which raises a stylometric question as to which is more original. Scholars actually debate this (some think the Bezae version is earlier than the canonical and the canonical is an abbreviation of it). This has to be looked into. Is the style across Bezae more coherent or more divergent with either of the other “versions” and what can that tell us if anything?
  
  For example, if the style of Bezae is smoother (it lacks any divergences between the alleged Marcionite and non-Marcionite sections), this would be evidence against the Marcionite thesis and for the Bezae text being earlier. Not a proof (the Bezae text could have been “smoothed over” stylistically later; that’s just less common and thus less likely). This would be a rare instance of being able to use the method to establish (probable) chronology. But the fact that the Bezae text could be evidence against the Marcion-first theory means the latter can’t prevail until this one final test is (competently) performed.
  
  I personally suspect Bezae is a reworking (an expansion) and Marcion a reworking (an abbreviation) and that canonical Luke is a reworking of a lost proto-Luke that looks more like canonical Luke than either Bezae or Marcion. And most scholars agree. So that’s the hypothesis that has to be tested against any alternative. And you might see what challenges that presents for a stylometric procedure.
  
  There is also a fourth version, though it is entirely hypothetical, it should be taken seriously: Brodie’s Kings Narrative. Which could actually be all or part of almost-canonical proto-Luke (since it spans Luke and Acts and thus entails one author composed at least half of Acts with much of the Gospel including its Nativity, but without other added parts now in our Luke). Brodie would thus predict the last half of Acts will deviate from the first half in style, while his hypothesized sequence of diptychs (spanning from Luke into Acts) will deviate from the rest of Luke. So that’s another theory that needs testing stylometrically, because it, too, could refute the Marcion-first hypothesis.
  Reply
Kevin Odermatt on June 6, 2025 at 5:52 pm

Thanks for the fascinating article!

I wanted to bring to your attention a brand new open-access peer-reviewed article called, “Critical biblical studies via word frequency analysis: Unveiling text authorship” (https://doi.org/10.1371/journal.pone.0322905) by an interdisciplinary team that includes Israel Finkelstein.

I would be interested in your thoughts on their methods, how they compare to Laken, and whether they might be usefully applied to the New Testament corpus.
Reply
- Richard Carrier on June 10, 2025 at 11:47 am
  
  It looks well done. But there’s far too much going on there and in a language I haven’t mastered so I can’t evaluate it.
  Reply
Tony on June 9, 2025 at 2:59 pm

An excellent review. I find anything Pauline fascinating since his “Lord Jesus” is so obviously not the later invented Jesus of Nazareth of he gospels.

Recently I saw one of Ehrman’s YouTube productions and, with his usual annoying snicker, he stated that Jesus had a brother and therefore Jesus certainly was a real person. You’ve addressed this subject in OHJ 582-592, and identified the “Brother of the Lord” meaning on pg 108 (element 12), as explained by Paul.

“Brother James” from Galatians 1:19 is not a biological brother of Jesus. Paul writes in Galatians: “I saw none of the other apostles—only James, the Lord’s brother”. Ehrman persistently overlooks that, since Paul identified Jesus as “the Son of God”, it means that brother James, just like Jesus, is a “Son of God” as well.

My conclusion is that Paul describes the Roman practice of male adoption. Paul promised his male followers that, at the imminent end-times, they will be—or already have been, adopted by God and they will be literally co-heirs with Christ Jesus. Paul writes:

Romans 8:14-30, “For those who are led by the Spirit of God are the children of God. The Spirit you received does not make you slaves, so that you live in fear again; rather, the Spirit you received brought about your adoption to sonship. And by him we cry,“Abba, Father.” The Spirit himself testifies with our spirit that we are God’s children. Now if we are children, then we are heirs—heirs of God and co-heirs with Christ, if indeed we share in his sufferings in order that we may also share in his glory.”

And,

Romans 8:29-30 “For those God foreknew he also predestined to be conformed to the image of his Son, that he (Jesus) might be the firstborn among many brothers. And those he predestined, he also called; those he called, he also justified; those he justified, he also glorified.”

And also,

Galatians 4:5-7 “to redeem those under the law, that we might receive adoption to sonship. Because you are his sons, God sent the Spirit of his Son into our hearts, the Spirit who calls out, “Abba, Father.” So you are no longer a slave, but God’s child; and since you are his child, God has made you also an heir.”

Paul’s Jesus was an angelic entity, who has never been on earth and has nothing to do with with the later Jesus of Nazareth character of the gospels. The author of Mark fabricated the Jesus of Nazareth story and others copied, changed and embellished on Mark.
Reply
- Richard Carrier on June 10, 2025 at 12:11 pm
  
  There have been several peer reviewed studies confirming your point here. I cite and discuss them in my next book.
  Reply