The Shakespeare algorithm

The study focussed in part on function words, the heavy-lifting but unglamorous class that includes pronouns, articles, and prepositions—“I,” “you,” “the,” “a,” “an,” “on,” “in,” “under.” As Pennebaker has written, there are only about four hundred and fifty of them in English, but they account for fifty-five per cent of the words that we use, the linguistic glue that holds everything together but goes mostly unnoticed. “We can’t hear them,” Pennebaker told me recently. “You and I have now been talking for ten minutes, and you have no idea if I’ve used articles at a high rate or a low rate. I have no idea.” Everyone has a pattern, though, and this is what he and Boyd sought in an array of works by Shakespeare, Theobald, and Fletcher. They also took other habits into account, such as three-word phrases typical to each author; for Shakespeare, these included “my lord your,” “what says thou,” and “as it were.” (“Quality work there, Shakey,” Boyd said.)

Generally speaking, the results of the “Double Falsehood” analysis indicate that the voices of Shakespeare and Fletcher predominate, and that Theobald’s is minimally present. It might be objected that, if Theobald had set out to imitate Shakespeare, he would surely have aped his language. But function-word usage is very hard to mimic, Boyd and Pennebaker told me. As with other linguistic tics, a writer’s own propensities will more than likely bleed through. “The Cuckoo’s Calling,” a detective novel, could only have been written by the fantasy author J. K. Rowling; Federalist No. 49, although published under a pseudonym, could only have been written by James Madison. Indeed, as Maria Konnikova reported in March, function-word patterns and other metrics may be able to establish not only an author’s voice but also her disposition and mood. Pennebaker has already produced rough tools that scan people’s tweets for signs of depression and anxiety. The “Double Falsehood” study purported to shed similar light on the Bard’s psychology, noting, for example, that his “relatively dynamic writing style and relatively high use of social content words” suggested someone who was “socially focused and interested in climbing higher on the social ladder.”