Did Daniel Wemp really say that? Using corpus linguistics to evaluate the likelihood that Jared Diamond�s reported quotes in The New Yorker were ever spoken

iMediaEthics publishes international media ethics news stories and investigations into journalism ethics lapses.


Home » Fact Checking»

Daniel Wemp smiles at photographer and imediaethics researcher Michael Kigl, in the Southern Highlands of Papua New Guinea, July 2008. Kigl quickly located Wemp, even though New Yorker fact checkers failed to do so. Wemp adamantly denies Diamond's quotations that convey most of the facts in the New Yorker tale were ever said by him. Dr. Douglas Biber's conclusion that the words Diamond quoted were in all likelihood academic writing and not speech, supports Wemp's claim with science.

>The Pig in a Garden: Jared Diamond and The New Yorker Series

Art Science Research Laboratory’s imediaethics.org is publishing a series of essays on the controversy surrounding Jared Diamond’s New Yorker article, “Annals of Anthropology: Vengeance is Ours.” The essay series titled,The Pig in a Garden: Jared Diamond and The New Yorker, is written by ethics scholars in the fields of anthropology and communications, as well as journalists, environmental scientists, archaeologists, anthropologists and linguists et al., and edited by Rhonda Roland Shearer, Alan Bisbort and Sam Eifling. *Note: Savage Mind people are leaving for the summer for research and find themselves unable to keep up with posting essays of , what has turned out to be, more contributors than anticipated. imediaethics.org will continue the series with the same editors. Alex Golub will comment from Papua New Guinea as time allows. Douglas Edward Biber’s essay is seventh in the series. Biber’s full analysis and report for iMediaEthics is also released here.

*        *         *        *         *           *          *

imediaethics Release:

Dr. Biber’s report and full analysis of quotations Jared Diamond attributes to Daniel Wemp in his New Yorker article, “Annals Of Anthropology,” April 21, 2009 . (Biber’s  CV).

Introduction to Dr. Biber’s “The Pig in a Garden” essay:

The language of conversation is dramatically different from the language of academic writing.  Some of the differences between the two are obvious to all of us, such as contractions and incomplete sentences.  However, many other grammatical differences are much more difficult to detect.

Over the last 25 years, a research approach has been developed for the empirical analysis of such grammatical characteristics. Referred to as ‘corpus linguistics’, the approach is based on the analysis of very large collections of natural texts from thousands of individual speakers and writers. Computer programs aid the analyses, which result in descriptions of grammatical features that occur frequently, features that are typical, and features that rarely occur. In addition, by comparing corpora with different kinds of texts, it is possible to contrast the grammatical characteristics that are usually found in conversation to those usually found in academic writing (or any other spoken or written varieties).

One major strength of corpus analysis is that it allows researchers to identify patterns of language use that might otherwise go unnoticed. Some language features are quite salient and even stereotyped. For example, we are probably all aware that conversation contains numerous contractions (e.g., I’m, can’t) and reduced forms (e.g., gonna). But it is more difficult to notice the core grammatical structures used in conversation, and how those are different from the typical grammatical structures of written texts.

A simple case study: Did Dowd appropriate language from a spoken conversation?

One application of corpus research is to analyze the language of quotes, to determine the likelihood that those quotes were actually produced in speech. For example, Maureen Dowd has recently been accused of plagiarizing a passage from a blog post by Josh Micah Marshall. What makes this case interesting is that Maureen Dowd denies reading Marshall’s blog, and wrote to Huffington Post that she heard the passage when it was produced by a friend in a conversation. The passage in question consists of a single, complex sentence:

“More and more the timeline is raising the question of why, if the torture was to prevent terrorist attacks, it seemed to happen mainly during the period when the Bush crowd was looking for what was essentially political information to justify the invasion of Iraq.”

Corpus-linguistic analysis can be applied to cases like this. The first step is to document the grammatical characteristics of the text passage in question. Then, those grammatical characteristics are analyzed in a large corpus of texts, to determine how common or rare they are. Crucially, the corpus needs to represent the relevant varieties of language. Thus, in the present case, we would consider whether these grammatical structures occur in a corpus of conversation.

Space does not permit a full corpus analysis of the Dowd quote. However, a few illustrative characteristics can be noted, showing that it is very unlikely that the quote was originally produced spontaneously in speech. First of all, the sentence is very long (45 words). This is much longer than typical sentences in conversation (which average 5-10 words), although the notion of ‘sentence’ is problematic when applied to conversation.

A more serious problem is the grammatical structure of the sentence. For example, the sentence employs multiple dependent clauses that modify (or complement) nouns:

the question of why…
the period when…
information to justify…

Although conversation does use dependent clauses, they are usually embedded in other clauses. Instead, these three dependent clauses are embedded in noun phrases: structures that are much more typical in writing than in speech.

A final characteristic to note is the medial if-clause. Corpus analysis shows that if-clauses are common in conversation, but they almost always occur at the beginning or the end of an utterance. In contrast, the Dowd quote uses a medial if-clause, embedded inside a WH-clause (the clause beginning with why), which is in turn embedded in an of-phrase, which is embedded in a noun phrase:

[the question
[why it seemed to happen…]
[if the torture was to prevent terrorist attacks]

If Dowd’s friend expressed similar meanings using the structures that are common in conversation, we would have found something like the following:

Well, if they were using torture to prevent terrorist attacks, why did it happen then? I mean I want to know why it happened right when they were wanting to justify the invasion of Iraq.

Instead what we find is a single extremely long sentence with multiple complex noun phrase structures and a medial if-clause. Corpus research shows that all of these characteristics are unusual in speech, and that this constellation of structures in a single sentence would be extremely unlikely.

This is not a very controversial case: our everyday experience with language tells us that it is highly unlikely that Dowd’s friend could have spontaneously uttered a 45-word sentence that just happened to be nearly identical to a written sentence from a blog, or that Dowd could have remembered that sentence verbatim and then later written it down (see Jonathan Bailey’s discussion ). Dowd’s new explanation (as told by The Times public editor) is that she took the sentence from an email message rather than a spoken conversation, which is much more plausible in all respects.

Jared Diamond’s reported quotes from Daniel Wemp

In other cases, the application of corpus analysis can be more insightful, because we have less external information that could be used to evaluate the source of the quotes. For example, corpus analysis has been applied to an evaluation of direct quotes that were supposedly produced in speech by Daniel Wemp, cited in the article ‘Vengeance is ours’ (by Jared Diamond), published in the New Yorker (4/21/08). For various reasons, scholars have questioned the appropriateness and accuracy of the information in this article. The question investigated with corpus analysis was more narrow: how likely, or unlikely, is it that the quotes cited in this article were actually produced in speech? That is, how likely is it that Daniel Wemp said these exact words?

In this case, the analysis began with a linguistic description of Diamond’s quotes (i.e., the quotations attributed to Daniel Wemp as his spoken words in the 4/21/08 New Yorker article). Those quotes were compared to an independent record of Daniel Wemp’s actual speech, based on verbatim transcripts of spoken interviews collected by Rhonda Roland Shearer. Quantitative analyses of the New Yorker quotes and the Daniel Wemp (DW) transcripts were carried out, to identify grammatical features that were frequent or rare.

Then, those quantitative findings were compared to previous large-scale corpus analyses of conversation and academic writing, to determine whether the New Yorker quotes were typical of the language normally used in conversation. The results were surprising: in many respects, the New Yorker quotes are much more similar to the language typically used in academic writing than to normal conversation.

Most of the corpus analyses used for this comparison were taken from the 1,200-page Longman Grammar of Spoken and Written English (LGSWE; Biber et al., 1999). The research for the LGSWE was based on analysis of a very large corpus that represents four major varieties: conversation, fiction writing, newspaper writing, and academic writing. For example, the sub-corpus for conversation includes approximately 6.4 million words, produced by thousands of speakers. The sub-corpus for academic writing includes 5.3 million words from 408 different texts. Computational / quantitative analyses of these corpora allow us to make strong generalizations about the grammatical characteristics that are frequent or rare in conversation, contrasted with the features that are frequent/rare in academic writing. The detailed research findings in the LGSWE can be applied to characterize the linguistic style of individual texts, to describe the extent to which the language of that text is typical of conversation or academic writing.

This linguistic analysis shows that the Diamond quotes (language attributed to Daniel Wemp in the 4/21/08 New Yorker article) are atypical of speech. Rather, these claimed quotes contain numerous grammatical constructions that are common in formal academic writing but very rarely used in normal speech. Further, those same grammatical constructions are not used in the verbatim transcripts of actual speech produced by Daniel Wemp (referred to as DW below).

Taken together, the linguistic analyses indicate that it is extremely unlikely that the New Yorker quotations are accurate verbatim representations of language that originated in speech. To put it simply, normal people do not talk using the grammatical structures represented in these quotations. However, these quotations do include several grammatical structures found commonly in academic writing, suggesting that the quotations were produced in writing rather than being transcribed from speech.

Grammatical characteristics of the quotations in the New Yorker article

Certain characteristics of conversation are easy to notice, and so almost any portrayal of speech will include these features. For example, even unskilled novelists are certain to include these stereotypical features in their fictional dialogue. Contractions are probably the most noticeable feature of speech (e.g., it’s, he’s, I’m), and the quotations in the New Yorker article are typical of speech (and fictional dialogue) in that they incorporate numerous contractions

The use of simple coordinators (especially and) is also a salient characteristic of conversation, and the Diamond quotes incorporate frequent use of that feature. One major function of and is to connect clauses, and this use occurs frequently in both the Diamond quotes and in the actual transcribed speech of DW. For example:

If you die in a fight, you will be considered a hero, and people will remember you for a long time.
I have given all these story and those stories are very true and those names are not fake.

However, many other grammatical characteristics are less apparent to the casual observer. This is where corpus analysis can be useful: to identify the grammatical features that are actually common or rare in conversation, especially features that would go unnoticed otherwise.

For example, Jared Diamond’s quotes frequently employ the coordinator and (as well as but) in two different ways: 1) to connect clauses, and 2) to connect two adjectives. As noted above, corpus research shows that the first use is in fact very common in conversation. However, the second grammatical pattern is rare in actual conversation, although it is common in formal writing. Examples from the Diamond quotes are:

my father was felt to be too old and weak
quick but correct decisions
my tall and handsome uncle

This grammatical pattern is rare in both the actual transcribed speech of DW and in the corpus of conversation generally. Thus, even in the use of the coordinator and (and but), the Diamond quotes are more similar to written language than to actual speech.

The noun phrase structures found in the Diamond quotes are especially atypical of normal speech. Corpus research shows that many of these structures are extremely rare in normal conversation, while they are quite common in academic writing.

In normal conversation, a majority of noun phrases are realized as pronouns. The Diamond quotes (and the transcribed speech of DW) are typical of conversation in that they use numerous pronouns.

However, the Diamond quotes are atypical of conversation in that they also include numerous noun phrase structures that are extremely rare in conversation. Most of these structures are also rare (or unattested) in the actual transcribed speech of DW. One structure of this type is noun phrases that have adjectives as modifiers, referred to as ‘attributive adjectives’. These adjectives are very common in the New Yorker quotes; for example:

biological father
lower left leg
hot pieces of wood
public battle
unexpected words
experienced fighters
bare hands
constant suffering

Corpus research shows that attributive adjectives are 3 to 4 times more common in formal writing than in conversation. But these adjectives are common in the Diamond quotes, occurring about 30 times per 1,000 words. This density of adjectives is about 2 to 3 times as frequent as in normal conversation. (That rate of occurrence is similar to the normal rate in written fiction.) The density of adjectives is also considerably more common in the Diamond quotes than in the transcribed speech of DW.

Relative clauses with a fronted preposition are extremely rare in everyday conversation, although they are relatively common in formal writing. Surprisingly, the Diamond quotes include three examples of this structure:

a stone quarry from which the Ombal enemy was throwing stones
a night raid in which we sneak into an enemy village
each battle in which we succeeded in killing an Ombal

There are no examples in the formal statement of DW. (There might be one example of this type in the verbatim interview transcripts, but the actual structure is difficult to interpret.)

A third unusual noun phrase structure in the New Yorker quotes is the dense use of prepositional phrases as noun modifiers. Similar to the previous two structures, prepositional phrases as noun modifiers are extremely common in formal writing but rare in normal conversation. There are numerous examples of this structure in the New Yorker quotes; for example:

a strong young man in his prime
The original cause of the wars between the Handa and Ombal clans
real enemies of your target
grievances of their own
the mistake of hiring a man who actually does not consider your target to be his own enemy
feeling of anger
Both men and women on the other side
our endless cycles of revenge killings

Many of the noun phrases in the New Yorker quotes are especially surprising because they contain multiple modifiers (adjectives or nouns as pre-modifiers, and relative clauses or prepositional phrases as post-modifiers). It is rare in conversation for a noun phrase to have two or more modifiers, while this pattern is relatively common in formal academic writing (with over 20% of noun phrases having multiple modifiers). It thus is highly noteworthy that such structures commonly occur in the New Yorker quotes. Many of the examples listed above are of this type:

a stone quarry from which the Ombal enemy was throwing stones
a night raid in which we sneak into an enemy village
a strong young man in his prime
a spear wound on the back of your leg
a single outnumbered enemy
tall and handsome man
quick but correct decisions
endless cycles of revenge killings
The original cause of the wars between the Handa and Ombal clans

Several of these noun phrases have a very complex structure, with multiple levels of embedding; for example:

The original cause [of the wars [between the Handa and Ombal clans] ]
The way [that we come to understand things [in life] ]
all the stories [that grandfathers tell their grandchildren] [about their relatives [who must be avenged] ]

In general, such structures are extremely unusual in speech but common in writing. However, the examples given above are even more unlikely to occur in speech because they often occur in the subject position of a clause. There is a very strong tendency in English conversation (and speech generally) for the subject noun phrase to be short and simple, usually a simple pronoun, as in

Yeah, he went.

Even when an utterance is longer, the subject noun phrase is almost always short, as in:

I don’t think I can go next week.

The LGSWE reports that more than 70-80% of the subject noun phrases in normal conversation are simple pronouns, and more than 90% of the subjects are a simple noun phrase with no modifiers.

In contrast, what we find in the New Yorker quotes is long, complex noun phrases as the grammatical subject, as in:

[The original cause of the wars between the Handa and Ombal clans] was a pig that ruined a garden.
[The way that we come to understand things in life] is by telling stories, like the stories I am telling you now, and like all the stories that grandfathers tell their grandchildren about their relatives who must be avenged.

Structures like these are found in academic writing, although they are not especially common; rather, even in writing, it is more common to use a relatively simple noun phrase as the grammatical subject. However, such structures are virtually unattested in normal speech, and so the Diamond quotes are highly unusual as representations of speech in this regard.

There are other features in the New Yorker quotes that are unusual, being much more typical of writing than speech. One of the obvious features is the repeated use of passive voice verbs in the quotes (e.g., was felt, be considered, be remembered, be forgotten, be avenged, etc.). Passive voice verbs are generally rare in conversation, but approximately one-third of the verbs in academic writing are passives. The transcribed statement of DW also includes some passive verbs (several about the article ‘being published’), but not with the same density as the Diamond quotes.

A final noteworthy characteristic is the use of to-clauses. Apart from the semi-fixed expression want to (and to a lesser extent would like to), to-clauses are much more common in writing than in speech. However, the Diamond quotes have a very high density of these constructions. What makes this pattern especially noteworthy is the specialized types of to-clauses found in the New Yorker quotes. In particular, two of the constructions that occur repeatedly in the New Yorker quotes are structures that rarely occur in normal speech:

1)Noun + to-clause
the opportunity [to see who really are the best marksmen]
the necessary experience [to make quick but correct decisions]

2)‘extraposed’ to-clauses controlled by an adjective
it’s not acceptable [to set fire to the hut]
it’s already extremely dangerous [for us to penetrate enemy territory]
it will be easy [for the enemy to kill you]

Here again, these structures are relatively common in formal writing, but it is highly unusual to find such structures in normal speech.

In sum, the grammatical characteristics of the New Yorker quotes are much more typical of formal writing than of actual conversational speech. In fact, many of these grammatical characteristics are extremely rare in speech. This fact is all the more striking in that there is a whole suite of ‘literate’ features which appear commonly and pervasively throughout the New Yorker quotes.

It would be less noteworthy to find just one or two examples of ‘literate’ grammatical constructions in speech. Corpus research does not show that such features are impossible in conversation. However, corpus research does show that such features are rare and exceptional in normal conversation.

To indicate just how different the Diamond quotes are from the language of normal conversation, we can compare the rates of occurrence for these specialized grammatical features. All rates are computed for the same basis: a rate per 1-million words of text. For actual conversation, the rates are computed from analysis of a 5-million word corpus (with most specific findings taken from the Longman Grammar of Spoken and Written English; LGSWE). The rates for the Diamond quotes are based on the approximately 1,500 words of quotes in the New Yorker article. The following table compares the rates of occurrence for several of the features discussed in Section 2 above.

Grammatical feature Rate in actual conversation Rate in Diamond quotes Comparison of the two
Attributive adjectives (e.g., original, biological) c. 15,000 c. 30,000 2 times more frequent in Diamond quotes
Preposition of c. 12,000 c. 34,000 3 times more frequent in Diamond quotes
Noun post-modifier complexes (e.g., The original cause [of the wars [between the Handa and Ombal clans] ]) c. 500 c. 3,000 6 times more frequent in Diamond quotes
Noun phrases with both pre-modifiers and post-modifiers (e.g., a stone quarry from which the Ombal enemy was throwing stones) c. 500 c. 4,000 8 times more frequent in Diamond quotes
‘extraposed’ to-clauses controlled by an adjective (e.g., it’s not acceptable [to set fire to the hut]) c. 100 c. 2,000 20 times more frequent in Diamond quotes
Noun + to-clause (e.g., the opportunity [to see who really are the best marksmen]) c. 50 c. 1,200 25 times more frequent in Diamond quotes
Adjective and/but adjective (e.g., tall and handsome) c. 20 c. 2,000 100 times more frequent in Diamond quotes
Preposition + Relative pronoun (e.g., each battle in which we succeeded in killing an Ombal) c. 20 c. 2,000 100 times more frequent in Diamond quotes

These comparisons show the magnitude of the discrepancies between the grammatical style of normal conversation contrasted with the grammatical style of the Diamond quotes. To find one of these grammatical features in a normal conversation is noteworthy. To find repeated use of this large constellation of features in actual spoken discourse, some of them occurring c. 100 times more often than in normal conversation, is extremely unlikely. In contrast, these are all features that are typical of academic writing, suggesting that they have their origin in writing rather than actual speech.

Other corpus studies (e.g., the book University Language; Biber, 2006) have shown that these same features are rare and exceptional in even academic speech, including university lectures. In contrast, what we find in the Diamond quotes is the pervasive use of a suite of grammatical constructions, which are all rare in conversation but common in formal writing. This constellation of grammatical characteristics is also strikingly different from the grammatical style of the verbatim transcripts of speech produced by DW. In sum, the analysis strongly indicates that the Diamond quotes are much more like discourse that was produced in writing, reflecting the typical grammatical features of formal academic prose, rather than verbatim representations of language that was produced in speech.


Douglas Edward Biber, Regents’ Professor, Applied Linguistics, Northern Arizona University. Research interests:  English grammar, sociolinguistics, computational and statistical tools for linguistics, corpus linguistics, and register variation (synchronic, diachronic, cross-linguistic). Educational and Professional ExperienceExternal Grants Funded; Books and Mongraphs; Academic Articles

Submit a tip / Report a problem

Did Daniel Wemp really say that? Fact Checking Jared Diamond

Share this article:

Comments Terms and Conditions

  • We reserve the right to edit/delete comments which harass, libel, use coarse language and profanity.
  • We moderate comments especially when there is conflict or negativity among commenters.
  • Leave a Reply

    Your email address will not be published. Required fields are marked *