SLTinfo logo



At a fundamental level, voice is simply the noise created by the vibration of the vocal folds (vocal cords) within the larynx. In English all speech sounds are produced on an outgoing (egressive) air stream from the lungs, i.e. air from the lungs is directed up through the trachea, through the space between the vocal folds in the larynx (i.e. the glottis) and eventually out through either the mouth or nose. This air stream may either be vocalized by the vibration of the vocal folds or non-vocalized. The vocalized air stream is used to produce voiced speech sounds, e.g. all vowels, and voiced consonants such as ‘b’, ‘d’ and ‘g’. A non-vocalized air stream is used to produce many voiceless consonants, e.g. ‘p’, ‘t’ and ‘s’.

Paralinguistic features

The production of the human voice occurs at the physiological level (see ‘The Communication Chain’ in Communication Theory). More specifically, it occurs at the laryngeal level. Consequently, with the exception of contrasting voiced and voiceless speech sounds, voice is not a linguistic phenomenon. Rather, various features of the voice work together with verbal language to express meaning. Accordingly these features are referred to as paralinguistic features.

The paralinguistic features of the voice convey a lot of information about the speaker. For example, even when we cannot see the person who is speaking (over the telephone for instance) we usually know if it is a male or a female and, very often, how old the person is: usually we can tell if it is a child or an adult. We may even be able to tell which part of the country they come from by their accent, and so on. We will discuss the most salient paralinguistic features below.


The pitch of the voice refers to how high or low the note produced by the vibrating vocal folds appears to be. The faster the vocal folds vibrate the higher the pitch. Conversely, slowly vibrating folds will produce a lower pitch. The pitch of the note is measured by its frequency in Hertz (Hz), i.e. the number of vibrations per second. The note A above middle C on a modern piano has a frequency of 440 Hz. The optimum or average pitch for the speaking voice varies from person to person but, typically, men (128 Hz) will speak with a lower pitch than women (225 Hz).

Loudness (volume)

Whereas pitch is determined by the speed of the vibrations of the vocal folds, loudness is determined by the strength of their vibration. This is controlled mainly by the force with which the air from the lungs is allowed to pass through the larynx. It is important to understand that the pitch of the voice can remain constant whilst the loudness of that particular pitch can be varied. In other words, it is possible to keep the vibration frequency the same but to increase the strength of the vibration by forcing through more air.

Loudness is measured in decibels (dB). A whispering person is typically speaking at a loudness of around 10 dB. In comparison, someone shouting may be around 70-80 dB. A jet engine will create a loudness of about 110 dB and anything above this usually creates a sensation of pain, i.e. the pain threshold for hearing.


When the egressive air stream from the lungs is vocalized by the vibrating vocal folds it is amplified by resonating in the chest, throat, mouth and the sinuses of the face and forehead. This resonance gives the voice a characteristic musical quality, or timbre, which is determined by such things as one’s size, the shape of the chest cavity, the mass of the vocal folds, and so on.

In addition, the term ‘resonance’ may also refer to the relative balance of sound being produced either through the mouth or through the nose. Certain English speech sounds are produced by allowing the escaping air to pass through the nose: these are the nasal consonants: ‘m’ as in mum, ‘n’ as in nut and ‘ng’ as in wing (see Consonants for an explanation of nasals). When produced accurately there should be no escape of air through the mouth. If air does escape through the mouth then the speaker is said to be hyponasal. This condition will occur when a person has a cold such that his or her nose becomes blocked. Under these circumstances the air can only escape through the mouth and it gives rise to a characteristic lack of nasal resonance. Conversely, all other English speech sounds are produced by the air escaping through the mouth, i.e. no air should escape through the nose. If the speaker either allows too much air to escape through the nose, or cannot prevent air escaping in this manner (as a result of a cleft palate, for example), the voice will sound hypernasal, commonly called nasal speech.


The quality of the voice refers to the complexity of the note produced by the vibration of the vocal folds. The term is frequently used to describe how aesthetically pleasing the voice is. It is extremely difficult, however, to define a typical voice. Nevertheless, the human voice should be pleasant with an engaging musical character and the absence of any interfering noise. Three broad quality types are recognized:

breathy voice

A person speaking with an excessively breathy voice produces an audible escape of air through the glottis. It is often produced as a result of insufficient closure of the vocal folds, thus creating a small chink through which air from the lungs spills through. This could be due to an organic impairment such as vocal nodules or lack of breath control. Breathy voices typically sound weak, as they are often produced with reduced loudness.

hoarse voice

In simple terms, hoarseness is a disruption of the usually stable note produced by the vibration of the vocal folds, as a result of airflow turbulence. This turbulence can be created because the weight or tension of one vocal fold relative to the other is altered. Consequently, they no longer vibrate in synch and this creates the noise we perceive as hoarseness. The weight or tension of the vocal folds can be affected by such things as the build up of mucous (e.g. when one has a cold), growths (e.g. vocal nodules, polyps), and muscle tensions.

Typically, hoarseness is associated with weak vocalization and lowered pitch: the voice sounds rough. It may be accompanied by occasional bouts of breathiness, in which case it is sometimes referred to as husky voice.

harsh voice

A harsh voice is associated with tension in the muscles of the larynx, those involved with breathing and, often, the vocal folds themselves. There is typically a hard glottal attack, i.e. the speaker brings the vocal folds together abruptly and with greater force than is necessary. This obtrusive glottal attack creates an unpleasant sound. In contrast to a hoarse voice, harsh voice is characteristically associated with a raised pitch.


There is a great deal of emotional overlay in the production of voice. For example, consider someone who is clinically depressed. Frequently, this condition is accompanied by changes in vocal characteristics: reduced loudness, monotonous and lacking in energy. In contrast, an enthusiastic person frequently has a faster rate of speech, increased loudness and possibly exaggerated pitch variations.

Often, the most effective communicators are those who can effortlessly vary paralinguistic features to create an interesting and colorful voice which is capable of expressing a range of intellectual and emotional meanings.