A couple of weird peer reviewed studies recently came out arguing from statistical stylometry that (1) Galen (the famed second century Greco-Roman medical scientist) did not write most of the works attributed to him (which would be a major thing because we have some sixty books from Galen, far more than survive from any other ancient scientist), and (2) Pliny the Younger’s famed letter on the Christians is so heavily interpolated as to be of no historical use. In my opinion the Galen study is crank, and its passing peer review is a scandal (which sometimes happens). The Pliny study I think is just inept, and its passing peer review is merely a disappointment.

Today I’ll discuss these. I already explained stylometry and surveyed some examples in Statistical Stylometrics: The Good, the Bad, and the Horrid (Part 1: Paul). So I won’t revisit those details here. Just note that I will be referencing a lot of what I said there, so you may have to read it before understanding what’s going on here. But you can always just plod ahead and see. Note that the following two papers passed peer review (and so did Savoy’s from last week), which means peer review is not all that reliable for papers like this. Keep this in mind, since I hear Britt and Wingo (my “horrid” selection from last week) are trying to get their results in Christ Before Jesus through peer review, and there are lessons here about how that could go, and what you’ll still have to do to vet them (if peer reviewers let you down, as they did in all these cases).

I’ll start with Pliny.

Did Pliny Not Write What We Have about Christians?

This claim was made by Enrico Tuccinardi in a 2017 issue of Digital Scholarship in the Humanities (see “An Application of a Profile-Based Method for Authorship Verification: Investigating the Authenticity of Pliny the Younger’s Letter to Trajan Concerning the Christians”). He argues that the letter shows strong deviations from Pliny’s usual style, yet not enough to class this text as a forgery, and therefore it must have undergone extensive (ostensibly later Christian) editing of some kind, rendering the extant text historically useless. There is a good summary explanation of what Tuccinari did and claimed at Vridar. And there is already a decent brief critique of it by Larry Hurtado.

Hurtado points out that a previous (nonstatistical) study from the 1980s (cited by Tuccinardi) raises concerns that Tuccinardi seems to have misunderstood, perhaps for want of having the right credentials more in line with Hurtado’s own skillset: understanding ancient languages, authors, and genres. I have not been able to ascertain what credentials Tuccinardi even has. But his publications seem to range rather widely but mostly around the subject of computerized stylometry, not ancient rhetoric. The study on that that Hurtado is calling attention to is Federico Gamberini, Stylistic Theory and Practice in the Younger Pliny, which demonstrates Pliny is particularly known for large style variation across his letters. And this is a problem Tuccinardi kind of handwaves off in a way a non-expert reading his study might miss. After reading it myself (as well as some of his other publications) I have to conclude Tuccinardi is a bit of an eccentric but not a crank. He knows the computational methods well, but he seems out of his depth regarding ancient language, literature, and rhetoric. I think sharper peer reviewers would have rejected his study not for being bogus, but for technical flaws that would have required a re-do.

As Hurtado explains:

[C]ontrary to Tuccinardi (pp. 6-7), his data don’t of themselves “suggest” interpolations. His data surely indicate that letter 10.96 has a distinctive n-gram pattern in comparison with his composite book 10 profile. But the suggestion of interpolations (and that suggestion only) is from Tuccinardi, not the data. For, assuming the validity of his data, they are compatible with at least four hypotheses: (1) the letter is a forgery, (2) there are interpolations that corrupt its stylistic character, (3) Pliny’s stylistic profile is varied, and letter 10.96 simply exhibits that, and so/or (4) the method may be inadequate for the task and need some further tuning.

It is thus significant that Tuccinardi’s study used no actual controls. He thinks he did, by using the letters of Cicero and Seneca as a control, but that is not the actual control variable he needs to employ. It’s worth including. But it’s not what we need. I suspect his peer reviewers were not classicists and thus did not detect this, which is why it got passed when it shouldn’t have. Pliny varies his style tremendously especially in respect to genre, which is why Tuccinardi says he only created an “aggregate profile” of book 10 of Pliny’s correspondence, which contains all his state correspondence (mainly with emperor Trajan). This is distinct from all his personal correspondence in books 1 through 9. But Tuccinardi’s reasoning is backwards: letter 10.96 is more like the letters in Books 1–9, not less. So he is testing it against the wrong base text.

This is an immediate and enormous problem that single-handedly invalidates all of Tuccinardi’s results. Because letter 10.96 (I’ll call it Tuccinardi’s “disputed” letter) contains an unusual amount of personal thoughts and reflections compared to other letters in book 10. It therefore actually more resembles letters in books 1 through 9. Indeed, 10.96 deviates considerably from most of book 10 in being in a rhetorical style—most of the other letters there are in curt administrative Latin, with a comparative minimum of rhetoric or elaboration. They are, in effect, business letters designed to be as brief and to-the-point as possible. That couldn’t be more different than this letter which is unusually long, elaborate, rhetoric-laden, and on a very unusual subject—indeed, a unique one: Christianity; indeed, unique even more generally: religious disputes; indeed, very few letters in Pliny are even on religion. Likewise letters about interrogations or even investigations (rare to none). You’ll notice it is the only long letter in Book 10, and thus appears to be in a genre unlike most of that volume, and more like letters in the other volumes (e.g., just from the previous three volumes—I didn’t even check volumes 1 through 5 because 6 through 9 already show my point by themselves—compare: 6.2, 6.16, 6.20, 6.29, 6.31, 6.33; 7.6, 7.9, 7.17, 7.19, 7.24, 7.27, 7.33; 8.6, 8.14, 8.18, 8.20, 8.23 and 24; 9.13, 9.26, 9.33—none of which have anything comparable in volume 10 except 10.96).

So we really need to see the disputed letter’s profile with respect to the previous volumes. And even that would not be adequate. Because individual letters vary in genre. So what we really need to see is the disputed letter’s comparison to all other letters of similar genre or subject (or, indeed, length). In other words, Tuccinardi should be doing what Savoy and Laken did: test all the letters for distance metrics and see if any distinct groups form (is 10.96 then an outlier, or solidly in center mass). But one could especially test any of the other long (and thus most comparable) letters. Like Pliny’s letters to Tacitus on his father’s death, or his letters to his scientist friend Licinius Sura about his personal experiences with ghosts, or any other letters delving into religion, for example. But even just cherry-picking letters will not be a correct method. Because then you might miss important comparands. So you need to plot all of Pliny’s letters, individually, so we can see how many outliers there are, and thus whether letter 10.96 is even strange—or is it just one of many similar outliers owing to genre variation (or length or subject variation), not author variation. We could then look at all the letters that fall in that outlier group and see whether they share features explaining this.

In other words, Tuccinardi was supposed to run as a control each and every other letter of Pliny. Rather than just test this letter against “an average profile for book 10” we need to see what happens when we test any individual letter that way. Do they all end up outliers? Or a lot? Or some? For example, I predict 10.96 will be an outlier with respect to all the brief administrative letters of book 10 (which are most of book 10 by far, hence why its “average profile” looks like those letters, not this one), but not so comparatively strange with respect to all the long letters on personal or related subjects. At the very least, I need to see this test done. It is the only relevant control here.

That’s fatal. So I could drop mic here. But it gets worse.

Tuccinardi uses only a character n-gram method. And if you read my Part 1, you’ll know why that’s a problem. Latin, like Greek, is not sufficiently like modern languages for that method to work. It will inevitably generate a huge error rate by missing most of the actual structure and patterns of the language. This is because (quoting my last article) Latin, like Greek, “lacks consistent word order, and is heavily inflected (verbs and even nouns and adjectives change form radically based on gender and function),” which are not subject to an author’s control, and “dependencies can stand relatively far away” because “the inflection often clarifies the relation between” so that “units that belong together will often not be captured in the token n-grams” and certainly not in character n-grams (for example, “it is possible to put a noun at the start of the sentence and five or eight words later place the adjective modifying that noun” unlike “modern languages where that would never happen”), hence logical rather than physical adjacency must be employed for these languages, and token and syntax n-grams must be used rather than mere character counts. I discussed all this last time, following Katarina Laken’s study on Paul.

In short, Tuccinardi is behaving more like the sloppy Jacques Savoy than the savvy Katarina Laken (to draw comparisons from last week’s article on this). And this is precisely the sort of thing that could have been found out by running a proper control (as noted above). As just a few examples of how this can corrupt his results: this is the only letter in which Pliny discusses Christians, which would explain why it uses CHRIST- character sequences nine times, and why this will create an incongruous pattern to the rest of book 10 (or even all books, 1 through 9); likewise this may be the only letter containing long lists of criminal accusations and criminal-court terminology; this letter builds a particular argument contrasting past, present, and future events, which may drive a peculiar pattern of grammatical inflections that could explain its deviation from other letters in character sequences; and because of its unique and unusual subject matter, it contains a large number of rare words (for Pliny) but for no reason to do with the author. This includes words (and hence character sequences) nowhere else found in book 10, like sexus, paenitentiae, latrocinia, adulteria, superstitionem, quamlibet, robustioribus, etc. (plus many more words found only once elsewhere in book 10). These will also “drive out” (by displacing) many more common words Pliny might otherwise have used in other groups of letters (e.g. many more letters are on construction projects and will themselves be littered with content that we expect to be missing here), so Tuccinardi’s subsequent claim (which I’ll discuss shortly) that unique features won’t affect his analysis is false.

And all of these defects become more destructive to character n-gram reliability the shorter the text (and 10.96 is a very short text, creating enormous distortion in frequency metrics—leaving far fewer opportunities for Pliny to introduce words or forms common elsewhere, and making the frequency of words “in this one letter” look much larger than they actually are in Plinian practice). This becomes even worse when we notice Tuccinardi “dinked” his database by “split[ting Book 10 minus letter 96] into 15 fragments…each one approximately of the same size as” 96 (circa 3,000 characters). He seems not to realize this created a serious problem: the standard formulas for title, intro or greeting, and outro or request, are replicated many times within each block, but appear (obviously) only once in 10.96. This makes those features look “common” in the profile text but “rare” in the disputed text—but only because of this artificial meddling by Tuccinardi. It’s like concluding, “Usually Pliny’s letters have five introductions; this one has only one; therefore it’s not by Pliny,” when in fact all his letters have only one. Tuccinardi’s method is thus inherently error-generating.

This is why Tuccinardi’s results are useless. He needs to drop the inept character-count technique and instead use a token-tagged Latin text that he can run token and syntax n-grams on (like Laken did with Paul’s Greek). And he needs to test this on every letter (treating every letter as suspect) in order to generate a frequency of deviancy across Pliny, because of variatio, the Latin tradition of deliberately varying style, particularly as suits changes in genre, subject, and purpose. In other words, we need to see if his method even works. And the only way to do that is to test every letter (put every one in the dock)—or at least the long ones, or perhaps blocks of letters again but minus their titles and first and last sentences—and see what the method generates for it, so we can know whether this degree of variatio is even unusual for Pliny in the first place.

This would basically resemble what Laken did with Paul. We’d then see all the letters of Pliny on a plot, and which ones (and how many) are outliers, and how far out. Only then could we start making claims like Tuccinardi’s—assuming the results go his way (I suspect they won’t).

These are sufficient reasons to reject his results as unusable. But there are a few other problems I should discuss, just to be complete:

There Is a Multiple Comparisons Fallacy Here:

First, Tuccinardi kind of hides some failures. If you look closely, the distribution curve in Model 1 (see page 443) shows the disputed letter no more deviant than one of his fifteen chunks of other letters from Book 10 (and barely more deviant than a second). Model 2 shows near equivalent deviance for one chunk. Tuccinardi then decides to follow the models that show the most deviance, sweeping under the rug the models that didn’t. This looks like “venue shopping.” And yet, even on his preferred model, “Model 3,” he is impressed that it shows “the possibility that a generic Plinian fragment of Book X” would have comparable metrics to the disputed letter (10.96) is “0.5%.” But assuming all ten volumes are the same length, and thus a complete base text (rather than Tuccinardi’s unjustifiably narrowed base text) would have 159 “chunks” to compare (16 per volume minus the suspect “chunk”), then there is a 55% chance of finding an authentic text with the same metrics as 10.96 somewhere else in Pliny’s authentic corpus.

That’s right. Because if each one has a 0.005 chance of being authentic, then it has a 0.995 chance of being inauthentic, so the probability that at least one of them will nevertheless be authentic is the converse of the probability that none of them are, which is 1 – (0.995)^159 = 1 – 0.45~ = 0.55~ or 55%. That pretty much kills Tuccinardi’s claim that the authenticity of letter 10.96 is “unlikely.” It’s actually better than even odds—and that’s even by his most biased result. If we take his aggregate result (“for all six models,” he says, such an outcome has a probability “lower than 5.0%”), then the probability that letter 10.96 could be authentic is 1 – (0.95)^159 = 1 – 0.00029~ = 0.99971~ = 99.97%, that being the probability that at least one of the other 159 samples will be authentic with those metrics—which could as well be this one, the 160th sample. That alone seems to destroy his conclusion.

This all makes me wonder if we are being framed (a problem I explained last week). Did Tuccinardi only use Book 10 as a base text (contrary to all his stated reasons to) because he needed to to get his desired result? After all, when we re-include all the authentic letters of Pliny, his confidence level nullifies his thesis. Which seems a strange thing for him (and his reviewers) to overlook. So did he originally run it on the whole corpus and not even get a deviant result at all? Did he dump that result, not mention it in his submitted study, and jigger up an “only Book 10” profile because that got the result he wanted? This would be dishonest. But it is a question reviewers need to ask—and make sure to rule out by demanding authors run the full corpus profile so this kind of hack can’t be suspected—because a lot of scientists are pulling this shit, to an epidemic proportion now. So we can’t just give someone the benefit of a doubt. We need to demand that they eliminate this possibility by including the analysis we need to check for it.

Tuccinari Has Since Made Strange Claims about This:

Tuccinardi has since claimed he only counted “missing,” not added strings, so “unique” features in 10.96 cannot explain his results. But that’s not a correct description of what he did. As I already explained, he seems not to realize that a letter will lack commonplace features because they have been displaced by features unique to that letter. So 10.96 being on a weird subject will cause it to “lack” features more common to Book 10 (like, I noted, all the letters about building projects, or indeed, about money). He also claims the other letters are “unique” so this shouldn’t matter, but that’s twice erroneous. First, the other letters carry a frequency of common topics (like construction and money) despite their individual uniqueness. And since they lack the unique features of this letter, those features won’t show up in his Book 10 profile (making his entire argument circular). But more importantly, he seems to have forgotten that he didn’t test 10.96 against individual letters. So the effect he claims his study controlled for, it actually didn’t.

Hence I start to get suspicious when now he claims “many local topics” are in those other letters, so “they all are unique in the Pliny corpus,” but that’s not true. He did not run any other letters. He mashed bunches of letters together into blocks (he smashed 120 letters into 15 blocks). So none of those are on a unique subject. They are all variegated. This would create exactly the problem Tuccinardi is here falsely claiming he avoided. Moreover, all those other letters are actually on much more related subjects: the administration of the empire. Not religion. Not criminal court jurisprudence. Not personal social opinions about either. Much less the highly unusual context of Christianity specifically. And remember, among the examples of ways his method will fail him, I included “omission” conditions as well (e.g. his test letter has only one title, intro, greeting, and outro, while every one of his other fifteen “fabricated” letters will have several of each; and novel vocabulary by definition crowds out typical vocabulary, e.g. the unique Christian-related stuff displaces the more common administrative-related stuff, so even inclusions will entail omissions, eliminating the entire distinction he is trying to claim).

If Tuccinardi actually thinks he can give us a list of “things common to all the letters that are completely absent from this one,” he should just give us that list. Then we can check it for relevance. I suspect it will fail all relevance tests (as I already suggested before). But there is no way to do that check if he won’t give us the results. Thus excluding it from his study remains a fatal defect in it. This isn’t what character n-grams are or do anyway (which is why I suspect he is not being completely forthright about this). So perhaps I’m being overly charitable to him here. But this is all a bit shady. And regardless, you can’t rescue a failed study by trying to fix it after the fact with vague references to still-unprovided data. Tuccinardi responds that it would take up too many pages, but that is a disingenuous excuse to avoid doing it, as the internet existed in 2016, and scientists often publish paper-related data online for just this reason. And Tuccinardi still could. So why isn’t he?

There Is an External Evidence Problem Here:

Finally, Hurtado questions Tuccinardi’s claim that our only external ancient reference to 10.96 is Tertullian who was prone to citing forged Roman documents. Hurtado points out that we also have Emperor Trajan’s reply, which corroborates at least some aspects of 10.96. Of course that could also be forged or meddled with. A more important point is what Hurtado could have said instead: that Tertullian corroborates essentially the letter we have, which requires it to not only have been forged within seventy years of the original, but within the manuscripts of Pliny, which would be an incredible accomplishment for 2nd century Christians who in no way had that kind of control over pagan literature (Christian libraries only came into existence in the 3rd century and their global document control only became pervasive in the 4th century). Moreover, the market would be awash with independent copies of Pliny’s ten volumes then, so for Tertullian to have based an argument on a forged copy would have been eristically suicidal (his lie would be exposed almost immediately, or he’d be outed as an easily duped joke).

So Tuccinardi’s theory rests on a lot of external epicycles: that the fake version we have was invented in the 2nd century and within an entire authentic volume of Pliny when Christians did not control the literature; that Tertullian would be so foolish (or gullible) as to attempt a then-easily-refuted claim; that no one noticed or mentioned this; and that Christians, when they did control the literature, successfully destroyed all but the faked edition without notice (or that someone centuries later “swapped” the authentic 10.96 in one of those editions out for the forged copy from some other source, and we just happen to only have a copy of that one).

These are not impossible things (we have examples of each one happening here or there in other cases), but you need evidence for them to render them likely, and there are an inordinate number of them, so their compound improbability is steep. Contrast this with the case we can make for the Antiquities of Josephus (ample evidence establishes our manuscripts all derive from the one controlled by Christians at their own library in Caesarea: see my chapter on Origen in HHBC) or against claims similar to Tuccinardi’s regarding the manuscripts of Eusebius (see The End of the Arabic Testimonium). Tuccinardi’s theory is in a much worse epistemic position than these.

So in the end, there are too many fatal flaws in Tuccinardi’s study to credit its findings. It needs a do-over.

Did Galen Not Write Half His Works?

My second example is way weirder. Like, strange. This came out just last year (2024), “The ‘Galenic Question’: A Solution Based on Historical Sources and a Mathematical Analysis of Texts” in the journal Histories, by Fernando La Greca, Liberato De Caro, and Emilio Matricciani (what’s with all the weird stylo pouring out of Italy?). The authors are all cranks—indeed, weirdly, shrouders. You can find examples of their crazy from De Caro, a “crystallographer”; Matricciani, a “telecommunications” professor; and from all three as a team, indeed more than once, so they are up to something. Which makes me wonder if this is a trojan horse meant to bolster some yet-to-come study about something else, or to pad cv’s to construct an expert status for its authors. La Greca is the only historian among them, though his credentials and specializations are obscure.

In general the question they are asking isn’t crazy. We expect some Galenic books to be fake (and as they note, some have already been identified as fake), not least because Galen himself reported fake books circulating in his name. In fact he published more than one list of his own books specifically to assist readers in spotting the fake ones. And medieval forgers kept faking more, for the same reason Christians constantly faked writings under authoritative names. So a stylometric study of Galen’s corpus would be a valuable enterprise. There are also texts of Galen that survive only in Latin or Arabic, which won’t be susceptible to this test (least of all as those are the most prone to later editing and so could be “mostly but not entirely” Galen), but it looks like those were correctly excluded from this study. However. they also excluded several Greek texts attributed to Galen, indeed some of the most important ones, and it is unclear why. Still (for whatever reason) they test 57 texts. But what they do is bonkers.

It all starts with the same major problem we saw last time: they never explain which Greek text they used, or how they digitized it. They imply they used an online edition of Kühn (still the most complete source for Galenic Greek; and though scattered editions exist that are better, that usually shouldn’t matter to this kind of study, though it might to this one). But what they link to has not been reliably digitized to a readable text. The only complete digitization of the works of Galen I know is in the Thesaurus Linguae Graecae; but if they used the TLG, they are supposed to credit it (that’s legally required by the TLG use contract), and they don’t. Which is a problem, because their method depends heavily on this. A bad digitization would tank their entire results. So their paper should not have passed peer review until they gave an adequate explanation of how they dealt with this problem.

But a bigger problem is that all their statistics rely on features that, in one way or another, don’t apply to ancient Greek. They seem to be making the same mistake I documented above and last time: thinking ancient Greek was structured like a modern language. It isn’t. They also don’t run an n-gram technique at all (on which, again, see my discussion last week), but run some convoluted statistics on just four metrics, all of them problematic: “words per sentence,” “words per interpunction”, “interpunctions per sentence”, and word length.

I am not aware of word length being a valid metric in any contemporary practice (and in support of it they only cite themselves). It is self-evidently a bad idea, even for English, but especially for ancient Greek, where words vary in length by grammatical context (indeed often by quite a lot), which is out of an author’s hands (the rules of the language are the rules of the language); but even in any language, word “length” rarely matches “style” anyway, since an author does not decide what words to use by how long they are. At most this metric might proxy to something else called “richness of vocabulary” (something Laken discussed; see my discussion last week), but when the average difference text-by-text is less than half a character (as they find), you should be extremely suspicious of this as a measure. If one author uses a lot of big words and another doesn’t, you aren’t going to see an average deviation of “half of a character.” To the contrary, that tiny variance will be confounded by the happenstance of subject (some subjects require bigger words to cover than others). Moreover, even if that could work (somehow?), this metric requires very reliable digitization (so they are sure not to be counting broken or merged words), bringing us back to their complete lack of discussion of that.
All three of their other metrics depend on punctuation. And Galen never wrote any punctuation. Punctuation did exist then, but was almost never used in standard texts. Most manuscript punctuation came from medieval scribes, and was highly inconsistent and varied by individual scribes. While modern edition punctuation (presumably what these guys are using) is a product of modern editors and thus will reflect as much the stylistic preferences of those modern editors as Galen. This is especially the case for “interpunction” (which I assume they mean by commas and colons and the like), which is highly variable editor to editor, and even from the same editor across their career, even across a single text (e.g. editorial fatigue can produce inconsistencies of practice). Sentence length (presumably meaning, periods) will vary less, but is still a product of modern editorial decisions. If you want to compare sentence length, you need a more reliable and consistent way of assigning that, which anchors to decisions the ancient author actually made, not the whims and preferences of modern editors.
Those three metrics still could (in theory) get to something useful—once you have a consistent and reliable sentence tagging (one based on some rule derived from the original author’s decisions, rather than modern editorial whims). For example, sentence length is a well-known stylistic feature. As is sentence complexity—how often an author relies on relative clauses and other digressive structures—which could pair to “interpunctions” (measuring length and number of dependent or digressive clauses), but again, you need a consistent and reliable sentence tagging (one based on some rule derived from the original author’s decisions, rather than modern editorial whims). These authors do not appear to have done any tagging at all, much less employed any consistent rule-based method of it. So this study really should have failed peer review until they came back with the work properly done.

There are also problems with their nonstylometric arguments. I found so many problems with these that a fisk is futile; you should just not trust any of their historical sections to be reliable. But just to illustrate with an example, they argue that a 1st century papyrus of a “commentary” on a work of Hippocrates contains “passages, parallels and similarities” with Galen’s De Usu Partium, and “therefore” that must date before. But that doesn’t follow. You can’t tell direction of influence from mere correlations. Nor is even the correlation defended. They cite no study showing this, and give no evidence that what they claim is true, or even what they are claiming. What do they mean by “passages,” “parallels,” or “similarities”? And why aren’t we being given any examples to judge by? There is a lot of crap like this and it’s tedious.

A more damning example is when they claim that one of Galen’s bibliographies, De Ordine Librorum Suorum (a book attributed to Galen that, for reasons not explained, is not included in their tested list of books), “inexplicably does not mention” De Usu Partium and “therefore” he didn’t write it. But…um…we don’t have a complete copy of De Ordine Librorum Suorum. So, I dunno, maybe the reason books are missing from it is because they were in the missing parts of that book? You know. Just sayin. They also don’t seem to be aware that they are referring to Galen’s book on the order of his books, not his book on the list of his books, which is the very different text, De Libris Propriis—which they call De Libris Suis, and do include in their analysis and credit as authentic. Add that does refer to De Usu Partium. More than once, as it describes his writing of its several volumes over time. And there Galen says that masterwork was begun around the time of the Antonine Plague. And he definitely means that plague because he mentions Antoninus, and that the plague was so bad that Galen had to flee Rome, and that just before that, he says, Titus Flavius Boethus left to govern “Syria-Palestine,” a province that didn’t exist until the late second century. So De Usu Partium is definitely by Galen of Pergamum and definitely late second century!

The TLG text of this, by the way, illustrates my point earlier about punctuation: it follows the Teubner edition rather than Kühn (cf. Kühn 19.16; and 19.20 describes Galen completing De Usu Partium around the time of the great fire that destroyed The Temple of Peace as well as Galen’s laboratory and many of his books, and goes on to describe how De Usu Partium had become a bestseller by then). The first of these sentences is rendered, “of De Usu Partium I dedicated to Boethus the first book, which Boethus took with him when he left the city before me to govern Syria-Palestine” (προτρεψαμένου με τοῦ Βοηθοῦ, περὶ δὲ μορίων χρείας ἓν τὸ πρῶτον, ἃ λαβὼν ὁ Βοηθὸς ἐξῆλθε τῆς πόλεως ἐμοῦ πρότερος, ἄρξων τῆς Παλαιστίνης Συρίας).

Here the TLG Greek follows my translation, which places a comma (an interpunction, not a sentence break) between “first book” and “which Boethus took (πρῶτον, ἃ λαβὼν ὁ Βοηθὸς). But Kühn, instead, breaks the sentence here (πρῶτον. ἃ λαβὼν ὁ Βοηθὸς), because a lot of grammar nazis today don’t know that you totally can start a sentence with “which.” So whether you break a sentence there or not is “up to you.” This is an example of what I mean. Galen made no decision here. Rather, Kühn did; and Teubner made a different one. So the sentence lengths (and number of interpunctions and thus words per interpunction and per sentence) in either version of this same text might reflect the different styles of Kühn and Teubner, not Galen—and Kühn and Teubner may even have used a different approach text by text or year by year, or even across a text (since they aren’t following any scientific rule but just punctuating as they feel like in the moment, and may even have changed their minds about that over time).

And Then, Face Palm

And besides all the things I have already pointed out, there remains the mother of all problems, when they make so fundamental a mistake as to seriously challenge my belief that their paper was peer reviewed at all. Maybe someone skimmed and rubber stamped it. But I guarantee you: no reviewer actually paid attention to it. Let me walk you to this…

Their overall approach is as follows (pay close attention to their numbering):

We first conjecture a Galen philosopher and physician who lived before Galen of Pergamum, between the late I century BC (Before Christ) and the epochs of Nero (54–68 AD) and Vespasianus (69–79 AD): we refer to him as Galen-1. Secondly, we refer to the philosopher and physician living from 129 to 216 AD, i.e., the historical Galen of Pergamum, as Galen-2. Thirdly, we refer to the authors of texts written after Galen of Pergamum’s death, authored to exploit his fame, as Galen-3 (Pseudo-Galen).

Note, first, that Galen himself said the third category included people before his death—there were forgeries “authored to exploit his fame” while Galen was still alive. That was even the reason Galen wrote so many books about his own books. But that’s a minor gaffe here. We’ll charitably accept “Galen-3” as just “all” those forgers, whether before or after Galen died (though that will span many different styles, as some of those forgeries may even be medieval). That’s not the problem. The problem is that they then confuse themselves when they interpret their stylometric results in light of these three numbered groups.

Here is their computerized plot (p. 320):

Apologies to the vision impaired. There is just no way to describe this thing. Hopefully the following text will give you the gist.

The numbers here refer to specific texts attributed to Galen, which they tabulated earlier (pp. 316–17). And those do indeed correlate with the key on the graph: the red dots are all Galen of Pergamum (Galen-2, the real Galen), by their own theory and designation; the green dots are all the texts they credit to their hypothesized “first century other guy” named Galen (Galen-1); and the blue dots are all the rest (Galen-3). Now, you might be scratching your head here. If you think what you see is that this plot shows literally all 57 texts within the “Galen actually wrote it” group, you are not seeing things. That is literally what their results show. If you colored them all red you’d pretty much just say, “Yep, these are all authentic texts.” Perhaps you might say, “Hey, we should doubt those outliers, like 51 (Institutio Logica) and 6 (De Libris Suis),” but, well, that destroys their entire thesis. They need these to be authentic, and the green stuff by “their hypothetical other guy,” but the green stuff is solidly in the center mass of the real Galen. And the blue stuff—the stuff even mainstream scholars have voiced doubts about—is even more Galenic! Ooops.

I won’t query these results further. Their method clearly sucks and cannot determine authentic Galen from non. De Libris Suis cannot be more certainly from Galen (it is one of the most decisively authentic books we have from him), and of course, they need it to be authentic for their theory, so they are screwed if it’s not. The Institutio Logica has had its authenticity questioned, but classicists have pretty decisively disproved that notion (see the introduction in Kieffer’s 1964 edition), and since it is on a wildly different topic than most of his opus, and in a different genre (it is a mathematical textbook, not a medical theory discourse), we should expect it to be in any outer orbit of Galen’s works stylometrically.

And so on. Basically, they found that all the “alternative” texts they tested all fall dead-center-mass of “authentic Galen” and only authentic stuff by Galen ever orbits outside that core.

Okay. So. Put down your soda.

In the text of their study they describe their results like this:

We notice the following facts:

(a) The texts allegedly attributed to Galen-1 fall into the region delimited by the dashed green line.

(b) The texts attributed to Galen-2 fall into the region delimited by the dashed blue line.

(c) The texts allegedly attributed to Galen-3 (Pseudo-Galen) fall in the large region delimited by the red dashed line which includes all texts.

Um. What? I do have color vision. I can even use my computer to confirm colors on my screen just to be sure. I can also see lines on a page. And I can understand written English. And I can read numbers and check them on a table. They got the colors wrong. And I don’t mean on their computerized graph. That’s all colored and numbered correctly. I mean when they decided to “interpret” their results and write a conclusion up for the paper, they read the colors wrong. They forgot to check the numbers on the dots and incorrectly thought “the texts attributed to Galen-2 fall into the region delimited by the dashed blue line.” In fact, Galen-2 is “the large region delimited by the red dashed line which includes all texts.” Galen-3 falls “into the region delimited by the dashed blue line.” So their entire paper is bollocksed. And even their peer reviewers missed this.

There are three takeaways here.

Histories is a garbage journal no one should ever trust anything from again.
La Greca, De Caro, and Matricciani are idiots. They are not just really, really bad at this (they hosed the history section and they hosed the stylometry from top to bottom), but they are bumbling doofs. All three of them!? Yes. Somehow. All three dudes missed this mindbogglingly catastrophic mistake (as well as their reviewers, but that can be blamed on their not existing or just phoning this in; although I would be remiss if I did not include the possibility that they are also idiots).
This paper is 100% useless garbage. It’s not even useful in its historical section (all the history there is hosed and unreliable). But at least there you get some scattered bibliography. But it’s completely useless in its stylometry. Because their conclusion is directly refuted by their own results (plain as day on the page). And even if we hashed-out all that crap, their actual results are useless: they found no distinctive markers for authentic or inauthentic texts of Galen. And if you want to know why, see my previous discussion. Their method could not have worked.

Now, maybe this is all a scam. Maybe they know this is bullshit and knew their reviewers would be unprofessional losers who’d sleep at the wheel. Maybe they just wanted to pad their cv with a bogus paper. Maybe they wanted to fake-up some results they could “cite” later for some other nefarious plan. I don’t know. But even if I am charitable, and credit them with total sincerity, this is a total steamer. It’s worth less than half a square of used toilet paper. And yet, it does describe its method and results well enough to figure that out. Which means it’s still better than Christ Before Jesus (my “horrid” selection from last time). Which should be impossible. But lo.

4 Comments

GOP59-60 on May 27, 2025 at 12:13 pm

Dr. carrier, did you see one of the guys over at Christ before Jesus made a video response about you? Just curious what you thought about it and if you’d ever consider doing debate wait them?
- Richard Carrier on May 27, 2025 at 1:47 pm
  
  After decades of experience I have concluded live debates are so routinely bullshit I don’t do them anymore unless well paid for the suffering (see my booking page). I do written debates for a lot less though. Because those force everyone to be serious. So if they want to debate some narrow (and I mean specific and narrow) claim in writing using the method of my past written debates (example, example, example), I’m game. Let them know. But don’t hold your breath.
  
  As for their video, I’m only interested if they said anything relevant. So, did they correct any fact or identify any error or fallacy in my critique? If no, their video is a waste of everyone’s time. If yes, please do give me a list of just those (timestamped even, if you have the time) and I’ll reward the effort with a reply here. Otherwise I generally don’t watch videos from cranks.
ncmncm on May 27, 2025 at 1:35 pm

On a bit of a tangent, and presuming the letters from Pliny Jr. to Trajan are mostly authentic…
Are interpolations about torturing Christians common; or, conversely, does Pliny elsewhere torture people with no apparent need? The ex-Christians he questioned seem quite forthcoming, making the torture seem out of place. Does Tertullian seem to know of it? If Pliny did much torture, and not just because he liked to, it seems like he would have noticed how unreliable it is.
- Richard Carrier on May 27, 2025 at 2:17 pm
  
  It’s hard to define what one would mean by “interpolations about torturing Christians.” I don’t know of any examples off hand. But interpolations are common. And fake and exaggerated martyrdom tales are common. So the conjunction of the two would not be surprising.
  
  Pliny rarely describes court procedures, so we could have not expected anything to compare, but by chance he does describe a similar case in Letter 7.6. Which confirms what we know from formal legal sources (like the Digest of Justinian), that under Roman law the testimony of a slave could only be given under torture (as it was otherwise disbelieved). So Pliny mentions the torture to Trajan as a formal confirmation that he had followed correct court procedure.
  
  It is also probable that this was more trusted than free testimony from non slaves, which is why this kind of testimony was so often sought in cases (in exactly the same way modern states think tortured testimony is more reliable; actual evidence of that be damned, even though the defects of torture you allude to their jurists were well aware of, and their solution was in prescribing the manner of questioning and discretion in order to limit false testimonies). In other words, Pliny might not have believed the accused free persons’ testimony, so he resorted (pro forma) to torturing slaves to fact-check it by. That is entirely in agreement with all our other evidence of how Roman courts operated.
  
  What the tormenta actually involved, however, was up to the magistrate’s discretion. It could be something so bad as to risk killing them. Or it could be something rather more like strong-arming or blackjacking (like a typical NYC police interrogation circa 1930). And Pliny doesn’t tell us what he used. It could even have been something minimal, purely symbolic, to meet the letter of the law (Pliny didn’t seem to have much antipathy for these people; after all, this wasn’t an aggravated murder case like the one he discusses in the other letter, where he might have felt free or the need be more brutal).