SLTinfo logo

Lexical Density

Written language and spoken language

Consider Text 1 below, which is an example of Margaret Williamson’s written language extracted from her the book ‘Life at the ICI’ (2008).

Text 1: Written Language

In 1918, when the chemical industry was first established in the area, Billingham was a village inhabited by a few hundred people but grew rapidly as ICI’s operations expanded, helped by the company’s reputation for providing secure employment. The wages, conditions and benefits offered by ICI were attractive and the company quickly gained a reputation as a good employer. Many of those we interviewed claimed this was their main reason for applying for a job. Our interviews also highlight the influence of family when making decisions about employment and ICI was certainly happy to recruit the sons and daughters of existing workers.

Now, take a look at Text 2 which is an example of Margaret’s speech, i.e. spoken language. [Text 2 is actually a summary of all of her spoken utterances as they appear in lines 1-36 of the example broad transcription in the article on this website entitled Transcribing Conversation, where Margaret is denoted as ‘M’.]

Text 2: Spoken Language

okay what do you want to talk about

what do you fancy doing on Saturday

yeah well we talked about Saturday or Sunday but Bede and Sinners are playing on Sunday so wouldn’t give us much time to get back for four o clock especially if we wanted to go to Browton

so probably Saturday

but we’ve got the…Paul Norton and his wife coming round on the evening time for a meal

mhm

on Sunday…you, you’re home all week…all…

from Monday

have you got any plans

m

so presumably though you’ll be going shopping

how many presents have you got to buy yet

yeah

There are some obvious differences between these texts. For example, the spoken language appears to have incomplete clauses (e.g. “On Sunday…you, you’re home all week…all…”). In fact, incomplete clauses such as these are a frequent feature of spoken language. The structure of the spoken clauses also appears to be simpler than those of the written text.

Lexical words and function words

A useful measure of the difference between texts (for example, between a person’s written language, and a transcription of conversation) is lexical density. In order to calculate lexical density we need to make a distinction between different types of words: (1) lexical words (the so-called content or information-carrying words) and, (2) function words (those words which bind together a text) within the word classes of English.

Lexical words include:

  • lexical verbs (e.g. run, walk, sit)
  • nouns (e.g. dog, Susan, oil)

  • adjectives (e.g. red, happy, cold)

  • adverbs (e.g. very, carefully, yesterday)

Function words, therefore include the remaining:

  • auxiliary verbs (e.g. can, will, have)
  • numerals (e.g. two, three, first)
  • determiners (e.g. the, those, my)

  • pronouns (e.g. she, yourself, who)

  • prepositions (e.g. in, to, after)

  • conjunctions (e.g. and, but, if)

A third type of word grouping is also typically recognized: inserts. These are words which are used to gain attention, express emotion, protest, and so on (e.g. okay, right, oh). These are not considered to be lexical words.

Lexical density of written language

Text 1 is reproduced below but this time the lexical words are underlined (you may not necessarily agree with the way I’ve classified the words, there are typically instances of ambiguity when analyzing texts).

In 1918, when the chemical industry was first established in the area, Billingham was a village inhabited by a few hundred people but grew rapidly as ICI’s operations expanded, helped by the company’s reputation for providing secure employment. The wages, conditions and benefits offered by ICI were attractive and the company quickly gained a reputation as a good employer. Many of those we interviewed claimed this was their main reason for applying for a job. Our interviews also highlight the influence of family when making decisions about employment and ICI was certainly happy to recruit the sons and daughters of existing workers.

In 1918, when the chemical  industry was first established in the area, Billingham was a village inhabited by a few hundred people but grew rapidly as ICI’s operations expanded, helped by the company’s reputation for providing secure employment. The wages, conditions and benefits offered by ICI were attractive and the company quickly gained a reputation as a good employer. Many of those we interviewed claimed this was their main reason for applying for a job. Our interviews also highlight the influence of family when making decisions about employment and ICI was certainly happy to recruit the sons and daughters of existing workers.

There are, therefore, 60 lexical words and 42 function words out of a total of 102. Lexical density is calculated as follows:

Lexical density = (number of lexical words/total number of words) * 100

= (60/102) * 100 = 58.8%

Lexical density of speech

Text 2 is reproduced below but this time the lexical words are underlined.

okay what do you want to talk about

what do you fancy doing on Saturday

yeah well we talked about Saturday or Sunday but Bede and Sinners are playing on Sunday so wouldn’t give us much time to get back for four o clock especially if we wanted to go to Browton

so probably Saturday

but we’ve got the…Paul Norton and his wife coming round on the evening time for a meal

mhm

on Sunday…you, you’re home all week…all…

from Monday

have you got any plans

m

so presumably though you’ll be going shopping

how many presents have you got to buy yet

yeah

There are, therefore, 48 lexical words and 56 function words out of a total of 104. The lexical density is, therefore:

Lexical density = (number of lexical words/total number of words) * 100

= (48/104) * 100 = 46.2%

Interpretation

A high lexical density indicates a large amount of information-carrying words and a low lexical density indicates relatively few information-carrying words.

The finding that the lexical density of speech (in this case, 46.2%) is less than that of written language (in this case, 58.8%) is typical. We have already noted that speech is typified by incomplete clauses. Incomplete clauses are a product of the speaker having to construct his or her utterances in real time. There is limited time to think about, and plan, what one wishes to say and speakers often commence along one trajectory only to pause and move on in another direction. Incomplete clauses are, however, not a common feature of written texts, where the author has a much longer time to plan and shape the units of meaning that he or she wishes to use. There is sufficient time to select the most appropriate lexical word, review the text and replace words before one makes the text available. The time pressures of speaking typically lead to a lexically simpler text. Lexical density, then, can serve as a useful measure of how much information there is in a particular text.

It can also be used to monitor improvements in the use of lexical items (information carrying-words) in children with under-developed vocabulary and/or word finding difficulties.

Reference

A few people have contacted me to enquire about a reference for lexical density in order to include it in a report, a written assignment, or similar. Unfortunately, there is no reference for lexical density as such. It is a well-known measure of lexical variation which is used in many linguistic analyses. If you search the internet for ‘lexical density’ you will find several of these. I do not know who was the first person to use a measure of lexical density in a study but it is now well-known and, as it is in the public domain, no one really references its use anymore in articles, reports, and so on.

However, the book that I often refer to for definitions is:

  • Biber, D., Conrad, S. and Leech, G. (2002) The Longman Student Grammar of Spoken and Written English Harlow: Longman. [ISBN: 0 582 237262]. I have found this to be a useful reference text, as it is a corpus-based reference work, i.e. the findings are based on an analysis of real world written and spoken texts.

There is also some information about lexical density to be found in:

  • Hewings, A., Painter, C., Polias, J., Dare, B. and Rhys, M. (2005) Getting Started: Describing the Grammar of Speech and Writing Book 1, E303 English Grammar in Context. Milton Keynes: Open University Press. pp50-52.