Who said that

A Carnegie Vacation Scholarship Funded Research Project

Hypothesis: each of Shakespeare’s characters will have their own unique vocabulary and patterns of speech that will make them identifiable as the speaker of any given piece of dialogue.

The Project

Over the six weeks, I have analysed and compared the dialogue of a selection of Shakespearean characters using digital tools.

Firstly, I selected 18 Shakespearean characters and isolated their dialogue from their respective plays and converted it into plain text format.

The majority of the research centred around using computer-based textual analysis programs to analyse the characters’ speech. For the most part I used AntConc to compare these dialogue samples and identify words that set characters apart from one another, either through the usage of unique words, or through the frequency of more common words.

The results of this research were used to put together a database of various words (around 280 words so far) and the number of times each character uses these words.

In order to test my hypothesis, I developed several activities where the participants would be asked to use one of the textual analysis tools to analyse an anonymous piece of dialogue and attempt to determine the speaker from a selection of potential “suspects”.

Research Limitations

I initially spent a great deal of time extracting the dialogue of each character from their respective plays, and removing the stage directions to ensure that only their speech was included in the research. I then converted the dialogue into plain text files to make them compatible with AntConc, my primary research tool. However, I became aware of inconsistencies and errors within these texts, for example, some texts may use -‘d on certain past participles, while other texts would use –ed (for example murder’d and murdered).

Therefore, to ensure the most accurate research possible, I started again, this time using character dialogue that had been carefully edited and standardised by the Folger Shakespeare Library. While this did not include the dialogue of several characters I had wished to research, it ensured the most accuracy possible when it came to inputting information into my database.

Given the time constraints and the large number of words I hoped to include in the database, I initially restricted myself to the analysis of 10 – 12 characters, however, as my research continued, this number increased to 18. Therefore, my findings so far only represent these 18 characters, and are not representative of all of Shakespeare’s plays.

Perhaps the most time-consuming part of this project was the conversion of the raw figures into my database. AntConc only provides the number of incidences of a word, for example, it would tell me that Hamlet uses the word ‘death’ ten times, while Lady Macbeth uses the same word only three times. What it does not take into account is that Hamlet has over six times the amount of dialogue as Lady Macbeth (Hamlet: 11784 characters; Lady Macbeth: 1925 characters). Therefore, while we could say that Hamlet uses the word “death” far more often than Lady Macbeth, if we were to consider the incidence of this word per ten thousand characters as opposed to the raw figures, the results would be quite different. Per ten thousand characters, Hamlet would use the word ‘death’ 8.5 times. Lady Macbeth on the other hand would use it 15.6 times. Therefore, I created two databases: one with the raw figures, showing the actual number of times each character said a particular word; and the other showing the incidences of the words per ten thousand characters, standardising my results to allow a more accurate and fair comparisons.


My hypothesis was that each of Shakespeare’s characters would speak in a way that would make them identifiable from an anonymous piece of text. This can be seen from a simple analysis of a piece of dialogue.

How now, who calls? Madam, I am here. What is your will? And stint thou, too, I pray thee, nurse, say I. It is an honor that I dream not of. I’ll look to like, if looking liking move. But no more deep will I endart mine eye Than your consent gives strength to make it fly. Good pilgrim, you do wrong your hand too much, Which mannerly devotion shows in this; For saints have hands that pilgrims’ hands do touch, And palm to palm is holy palmers’ kiss. Ay, pilgrim, lips that they must use in prayer. Saints do not move, though grant for prayers’ sake.

There are several aspects of this text that make it clear it was spoken by Juliet. The words in bold are words that are used most by Juliet (per ten thousand characters), while the words ‘stint’, ‘endart’, and ‘pilgrims’’ are completely unique to Juliet.

The use of the phrase ‘I pray thee’ also sets Juliet apart from the other characters. ‘I pray thee’ is a less common variation of ‘I prithee’ (‘I pray thee’ is used a total of 7 times by all 18 characters; ‘I prithee’ is used 20 times). Furthermore, of the eighteen characters in question, Juliet uses this less common phrase the most. Therefore, it can be seen that both individual words and phrases can make it possible to differentiate between Shakespeare’s characters.

One obvious flaw in this type of analysis is that people with a strong knowledge of Shakespeare may be able to identify the speaker of a piece of dialogue without assistance. For example, the word ‘nurse’ is a fairly clear indicator that this dialogue may belong to Juliet. This is why I created a selection of my own pieces of text in the style of “confession notes”. I gave volunteers a selection of potential suspects and asked them to determine the author of the note. This proved very successful and everyone was able to determine the author of the fictional note by picking up on particular words. While fairly simple in design, these tasks prove that each of Shakespeare’s characters has their own unique vocabulary that sets them apart from the other speakers, even within their own play.

These activities can also be used as an educational tool and an entertaining introduction to Shakespeare’s characters and language.

Further Research

At the moment, I have the dialogue of 60 Shakespearean characters available for research. In time, I would like to add more of these characters to the database in order to form a more comprehensive view of the similarities and differences between Shakespeare’s characters. The database itself will prove to be a very useful tool going forward, and I hope to continue expanding it in future.

While I primarily used AntConc in my research over the past six weeks, other programmes like Ubiqu+Ity could provide a much more nuanced and comprehensive analysis of what makes each piece of dialogue uniquely identifiable. Ubiqui+Ity allows you to search pieces of text for the type of language or LATs (Language Action Types) that a character uses. For example, I could search each character’s dialogue for language that indicates negativity, confidence, or immediacy and use this to tell them apart from other characters.

Furthermore, the results of my research have provided a number of interesting points for further inquiry. Having previously researched the role of women in several plays (Julius Caesar, Macbeth, and Timon of Athens), the language used by Cleopatra in the play Antony and Cleopatra proved particularly interesting. She has the highest incidences of words like ‘betray’, ‘betrayed’, ‘chastised’, and ‘decieved’, and while this could be expected given that Antony married another woman, it would be worth investigating whether or not Cleopatra’s vocabulary sets her apart from Shakespeare’s other women.

I would also expand my database to include function words and investigate the differences in their usage. I would be particularly interested in determiners and gendered pronouns.