Acoustic Measures (Norms)
Question:
I am really impressed with your site, but was wondering if you had any good references to obtain normative comparison for things such as jitter, shimmer, noise to harmonic ratio, and fundamental frequency.
My Reply:
It is difficult to be precise about norms for acoustic measures such as jitter, shimmer, noise-to-harmonics ratio and fundamental frequency. There are many factors which militate against declaring all-encompassing norms. Some of these are person-specific (e.g. gender and age differences), cultural (e.g. what north Americans may consider to be within normal limits may be different from what north Koreans consider to be typical), and related to the testing environment (e.g. variation in the equipment used, and – importantly – the use of different algorithms in the software programs which are used to make the measurements).
Measures of such things as jitter and shimmer using one software program cannot always be compared directly with measures made by another software program. This may be, for example, because different programs use different methods for deciding whether an irregular part of the vocal signal is voiced or not. In addition to this so-called voicing decision strategy, there will also be variation in the accuracy with which different programs can determine such things as the period and amplitude of a vocal signal. [NB: The time taken by one vibratory cycle of the vocal folds is called its period and the height of a sound wave is known as its amplitude. Amplitude is a measure of the amount of energy in the wave: the greater the amplitude of a sound, the greater the intensity.]
Thresholds of pathology
Despite these complications, some authorities do declare so-called thresholds of pathology. For example, the Multi-Dimensional Voice Program (MDVP) (Kay Elemetrics, 2008) indicates a threshold of pathology of <=1.040% for jitter and <=3.810% for shimmer (their parameters Jitt and Shim respectively). Clearly, any percentage score above these threshold figures is considered to be a sign of potential pathology. [I am using the term pathology here in its weak sense, to mean simply a departure or deviation from expected typical functioning – and not to indicate any particular organic pathology.] Now, as indicated above, there are many ways of calculating jitter and shimmer – each using a different formula. In fact, the Praat software (Boersma and Weenink, 2009) can calculate five different measures of jitter and six different measures of shimmer.
I have already indicated that instrumental assessment of voice is influenced by several factors, especially the so-called extraction algorithm and the type of recording equipment used. With respect to this, Maryn et al (2009:217) compared frequency perturbation (jitter) and amplitude perturbation (shimmer) measures using both MDVP and Praat programs, and both a purpose-built recording system and a personal computer-based system for acoustic voice assessment. They note that MDVP consistently yielded higher measures than Praat and conclude that “…one can hardly compare frequency perturbation outcomes across systems and programs and amplitude perturbation outcomes across systems.”
standardized procedure
It is for reasons such as these that it is extremely difficult to declare norms. Consequently, I standardize my equipment (microphone, pre-amplifier, sound card, computer, recording booth, and so on), choose one of the calculation methods, analyze the client’s voice using the chosen method and use the declared thresholds of pathology only as a guide. When I re-assess the client’s voice, I repeat the same procedure. In this way, I can at least attempt to minimize the effects of any extraneous variables.
Measures of jitter and shimmer
Measures of jitter are typically carried out on long sustained vowels and, when using Praat, I tend to use the Jitter (local) method. This is the average absolute difference between consecutive periods, divided by the average period. [It is this measure that MDVP calls Jitt, with a threshold of 1.040%.]
Like jitter measures, measures of shimmer are usually only performed on long sustained vowels. Again, when using Praat software, I usually use the Shimmer (local) calculation. This is the average absolute difference between the amplitudes of consecutive periods, divided by the average amplitude. [It is this measure that MDVP calls Shim, with a threshold of 3.810%.]
Harmonicity
Noise to Harmonic Ratio (NHR) is another useful measure (of hoarseness). This can be routinely measured using MDVP. For a signal that can be assumed to be periodic (e.g. a sustained vowel), the signal-to-noise ratio will be equal to the harmonics-to-noise ratio (HNR) – and it is this that I prefer to calculate when using Praat. Praat declares that a healthy voice phonating /a/ or /i/ should have an HNR of 20, and 40 for the phonation of the vowel /u/. Consequently, an HNR below 20 is considered to be a measure of noticeable hoarseness.
Fundamental frequency
With respect to pitch, a person’s habitual pitch – their speaking fundamental frequency (SF0) – largely depends on their sex and age. It will, however, also be affected by such things as the type of communication being undertaken, the speaker’s emotional state, background noise, reading aloud, talking on the telephone, the degree of intoxication if the speaker has been drinking alcohol, and so on (Mathieson, 2001, p. 76). Typically, men will have a lower SF0 than women, who will have a lower SF0 than children. Table 1 sets out some typical speaking fundamental frequencies for adults and children.
children |
women |
men |
|
Mean SF0 (Hz) | 265 | 225 | 128 |
Frequency range (Hz) | 208-440 | 155-334 | 85-196 |
Table 1. Average Speaking Fundamental Frequencies [Source: (Williamson, 2006, p. 177)]
Conclusion
In setting out my thoughts here, I have assumed that the clients are adults. Now, whilst acoustic measures can clearly be made for children, the legitimacy of using these to monitor changes in children over time is questionable because, as the child grows, the spatial relationships between the laryngeal structures (and remaining vocal tract) changes. For anyone conducting instrumental measurement of voice in adults, however, acoustic measures such as jitter, shimmer, harmonics-to-noise ratio and fundamental frequency are routinely undertaken.
It is evident that the figures published as normative data for adults differ depending on the software/equipment being used. I think the main message, therefore, is to select your software program, choose an appropriate method of calculation, use the same good quality recording equipment under standard conditions every time and use the published norms with caution. I have found that the main benefit of using these measures is to track a client’s progress over time, as one can legitimately compare one set of measures at time B with the initial set of measures made at time A (the so-called before-and-after method). The figures presented in Table 2 are suggested as possible normative data to be used with the Praat software (with caution).
jitter: | <= 1.040 % | |
shimmer: | <= 3.810 % | |
HNR: | < 20 | |
adult males | adult females | |
mean pitch: | 128 Hz | 225 Hz |
minimum pitch: | 85 Hz | 155 Hz |
maximum pitch: | 196 Hz | 334 Hz |
Table 2. Suggested Normative Data for Praat Measurements
Bibliography
Boersma, P. and Weenink, D. (2009) Praat: doing phonetics by computer (Version 5.1.17) [Computer program] Retrieved October 5, 2009, from http://www.praat.org/.
Kay Elemetrics (2008) Multi-Dimensional Voice Program, Model 5105 Lincoln Park, NJ: Kay Elemetrics Corporation.
Maryn, Y., Corthals, P., De Bodt, M., Van Cauwenberge, P. and Deliyski, D. (2009) ‘Perturbation measures of voice: a comparative study between Multi-Dimensional Voice Program and Praat’ Folia Phoniatrica et Logopaedica 61, 4, 217-26.
Mathieson, L. (2001) Greene and Mathieson’s the voice and its disorders (6 ed.) London: Whurr Publishers Ltd.
Williamson, G. (2006) Human communication: a linguistic introduction (2 ed.) Billingham: Speech-Language Services.