Did Daniel Wemp really say that? Using corpus linguistics to evaluate the likelihood that Jared Diamond�s reported quotes in The New Yorker were ever spoken

Daniel Wemp smiles at photographer and imediaethics researcher Michael Kigl, in the Southern Highlands of Papua New Guinea, July 2008. Kigl quickly located Wemp, even though New Yorker fact checkers failed to do so. Wemp adamantly denies Diamond's quotations that convey most of the facts in the New Yorker tale were ever said by him. Dr. Douglas Biber's conclusion that the words Diamond quoted were in all likelihood academic writing and not speech, supports Wemp's claim with science.

Space does not permit a full corpus analysis of the Dowd quote. However, a few illustrative characteristics can be noted, showing that it is very unlikely that the quote was originally produced spontaneously in speech. First of all, the sentence is very long (45 words). This is much longer than typical sentences in conversation (which average 5-10 words), although the notion of ‘sentence’ is problematic when applied to conversation.

A more serious problem is the grammatical structure of the sentence. For example, the sentence employs multiple dependent clauses that modify (or complement) nouns:

the question of why…
the period when…
information to justify…

Although conversation does use dependent clauses, they are usually embedded in other clauses. Instead, these three dependent clauses are embedded in noun phrases: structures that are much more typical in writing than in speech.

A final characteristic to note is the medial if-clause. Corpus analysis shows that if-clauses are common in conversation, but they almost always occur at the beginning or the end of an utterance. In contrast, the Dowd quote uses a medial if-clause, embedded inside a WH-clause (the clause beginning with why), which is in turn embedded in an of-phrase, which is embedded in a noun phrase:

[the question
[why it seemed to happen…]
[if the torture was to prevent terrorist attacks]

If Dowd’s friend expressed similar meanings using the structures that are common in conversation, we would have found something like the following:

Well, if they were using torture to prevent terrorist attacks, why did it happen then? I mean I want to know why it happened right when they were wanting to justify the invasion of Iraq.

Instead what we find is a single extremely long sentence with multiple complex noun phrase structures and a medial if-clause. Corpus research shows that all of these characteristics are unusual in speech, and that this constellation of structures in a single sentence would be extremely unlikely.

This is not a very controversial case: our everyday experience with language tells us that it is highly unlikely that Dowd’s friend could have spontaneously uttered a 45-word sentence that just happened to be nearly identical to a written sentence from a blog, or that Dowd could have remembered that sentence verbatim and then later written it down (see Jonathan Bailey’s discussion ). Dowd’s new explanation (as told by The Times public editor) is that she took the sentence from an email message rather than a spoken conversation, which is much more plausible in all respects.

Jared Diamond’s reported quotes from Daniel Wemp

In other cases, the application of corpus analysis can be more insightful, because we have less external information that could be used to evaluate the source of the quotes. For example, corpus analysis has been applied to an evaluation of direct quotes that were supposedly produced in speech by Daniel Wemp, cited in the article ‘Vengeance is ours’ (by Jared Diamond), published in the New Yorker (4/21/08). For various reasons, scholars have questioned the appropriateness and accuracy of the information in this article. The question investigated with corpus analysis was more narrow: how likely, or unlikely, is it that the quotes cited in this article were actually produced in speech? That is, how likely is it that Daniel Wemp said these exact words?

