Instrumental Measurement of Voice
Voice assessment procedures
Voice assessment for the diagnosis of potential voice disorders typically involves three main procedures:
- laryngoscopic examination
- perceptual assessment
- instrumental measurement
This article provides an outline of procedure 3 – instrumental measurement.
Instrumental measurement
There are many high technology instruments available for assisting the assessment of voice. Many, perhaps the majority, are used not by voice therapists per se but by suitably qualified physiologists, radiographers, and Ear, Nose and Throat surgeons. There are too many to be considered in detail here, so I will briefly outline three instrumental techniques with which most people will likely have some familiarity. I will then focus on a discussion of acoustic analysis.
Vocal tract imagers
These instruments allow us to visualize the internal structure of the vocal tract. Radiography – which uses X-rays – is probably the most well known technique. A related technique is videofluoroscopy, which is popularly used to image swallowing and is now regularly used in the assessment of dysphagia. Magnetic Resonance Imaging (MRI) uses magnetism rather than X-rays to visualize internal structures that may not be readily visualized with X-rays. There are several other instruments and techniques, each with their advantages and disadvantages, but the common feature is their ability to visualize internal body structures. Owing to the highly specialized nature of this technique it is typically carried out by suitably qualified radiographers.
Electromyography (EMG)
When a muscle contracts a small electrical current is produced which is typically proportional to the strength of the muscle activity. EMG measures this electrical activity of muscles. There are two types. Surface EMG involves placing two electrodes on the skin overlying the muscles to be investigated. Intramuscular EMG involves inserting a small needle electrode into the muscle itself. In both instances, the electrical activity is typically displayed on an oscilloscope. This technique has limited application but does assist in detecting levels of laryngeal muscle tension and may be used in cases of identified vocal fold paralysis. EMG is typically carried out by trained physiologists or physicians.
Electroglottograph (EGG)
This is a non-invasive device that measures the contact between the vocal folds. Two electrodes are placed either side of the larynx and a small electrical current is passed between them. As the vocal folds open (abduct) and close (adduct), the resistance to the flow of the current alters. The variations in resistance are displayed as an image on a computer screen which represents the movement/contacts of the vocal folds. This technique is also useful in gathering information about the fundamental frequency of the voice, and the voice quality. Unlike vocal tract imaging and EMG techniques, electroglottography is commonly carried out by voice therapists (speech therapists).
Acoustic analysis
Acoustic analysis is, in some ways, the objective counterpart of perceptual assessment of voice, in that it measures several of the same vocal characteristics that are explored using just auditory perception, e.g. pitch, pitch range, loudness, degree of hoarseness. Measurements can be made by dedicated instruments which are designed for a particular task(s) (e.g. a sound level meter, which displays the intensity level of speech sounds; a spectrogram, which displays the frequency and intensity of single speech sounds, syllables, words or connected speech) or by integrated software programs capable of measuring and displaying several parameters at once (e.g. KayPENTAX’s Computerized Speech Lab (CSL)).
Praat
One integrated package that I have used in my clinics for several years is Praat. Written by Paul Boersma and David Weenink at the University of Amsterdam, Praat is a computer program with which you can analyze, synthesize, and manipulate speech. It also has an in-built Voice Report tool. It is available for many different computer operating systems and can be downloaded for free from http://www.praat.org/.
protocol
When conducting any formal assessment/measurement it is important to follow a protocol, i.e. a standard procedure for systematically gathering the relevant data which will subsequently be used to describe and quantify the voice characteristics. There is insufficient space here to describe the protocol I use in detail. However, I always record the client phonating long sustained vowels in isolation and also performing so-called solo connected speech, as follows:
- Ask the client to take a normal breath and then to sustain the vowel sound ‘ah’ (as in the words art and heart) for about five seconds at a comfortable pitch and loudness on one exhalation, without straining. [Praat has its own built-in digital sound recorder.]
- Repeat Step 1 above but, this time, sustaining the vowel ‘ee’.
- Record the client speaking a short stretch of talk, e.g. “My name is Graham Williamson and I live in Billingham.”
Praat is a sophisticated program which can perform complex analyses (including spectrograms) or be used to measure relatively straightforward characteristics such as pitch. When investigating the vowels in isolation, I routinely measure:
jitter
This is also known as pitch perturbation and refers to the minute involuntary variations in the frequency of adjacent vibratory cycles of the vocal folds. In essence, it is a measure of frequency variability in comparison to the client’s fundamental frequency. Pathological voices often exhibit a higher percentage of jitter.
shimmer
Whereas jitter is a measure of the percentage irregularity in the pitch of the vocal note (pitch perturbation), shimmer is a measure of the percentage irregularity in the amplitude of the vocal note. It is often referred to as amplitude perturbation. Shimmer, therefore, measures the variability in the intensity of adjacent vibratory cycles of the vocal folds. As with jitter, pathological voices will typically exhibit a higher percentage of jitter.
Harmonics-to-Noise Ratio (HNR)
The vocal note produced by the vibrations of the vocal folds is complex and made up of periodic (regular and repetitive) and aperiodic (irregular and non-repetitive) sound waves. The aperiodic waves are random noise introduced into the vocal signal owing to irregular or asymmetric adduction (closing) of the vocal folds. Noise impairs the clarity of the vocal note and too much noise is perceived as hoarseness.
Praat is capable of measuring the proportions of periodic and aperiodic waves (noise) in the vocal note and displaying this as a Harmonics-to-Noise (HNR) ratio. Laryngeal pathology may lead to poor adduction of the vocal folds and, therefore, increase the amount of random noise in the vocal note. The greater the proportion of noise, the greater the perceived hoarseness, and the lower the HNR figure will be, i.e. a low HNR indicates a high level of hoarseness, and a high HNR indicates a low level of hoarseness. Figure 1 represents jitter, shimmer and HNR diagrammatically.

Figure 1. Jitter, Shimmer and HNR
When analysing the solo connected speech, I routinely measure the following:
mean pitch
This is the Speaking Fundamental Frequency (SF0), i.e. the average speaking pitch. For adult males this is around 128 Hz (cycles per second), for adult females it is about 225 Hz, and for children under the age of 10 years it can average 260 Hz.
standard deviation
This is a statistical measure of how much the pitch varies from the mean pitch.
minimum pitch
This is simply the lowest pitch recorded in the spoken sample.
maximum pitch
The highest pitch recorded in the spoken sample.
pitch range
Subtracting the minimum pitch from the maximum pitch gives the pitch range. This is an indicator of flexibility, representing a measure of how much the client varies their pitch during speech. Typical pitch ranges are 85-196 Hz for adult males and 155-334 Hz for adult females.