SLTinfo logo

Perceptual Assessment of Voice

Voice assessment procedures

Voice assessment for the diagnosis of potential voice disorders typically involves three main procedures:

  1. laryngoscopic examination
  2. perceptual assessment
  3. instrumental measurement

This article provides an outline of procedure 2 – perceptual assessment of voice.

Perceptual assessment of voice

This involves describing the voice solely by listening to it, i.e. using auditory perception. Speech therapists who specialize in working with people with voice disorders have received training in describing the relevant characteristics (see ‘Paralinguistic Features’ in Voice) of a disordered voice. Perceptual evaluation can be either informal or formal.


Informal perceptual evaluation takes place throughout the whole meeting between a therapist and a client. It is usual for the therapist to engage the client in spontaneous conversation and to conduct a case history designed to gather information about the commencement and history of the voice difficulty, relevant medical information, the domestic situation, client lifestyle and any traumas that may affect the voice. Whilst it may appear to the client that the therapist is merely ‘chatting’ on occasion, in reality the therapist is evaluating the voice on a principled basis. That is to say, the therapist will have some descriptive scheme in mind that allows them to appraise performance under various categories of concern (e.g. voice quality, pitch, pitch range, loudness, nasal resonance, flexibility, stamina, breathing method, and so on).


Formal perceptual evaluation typically involves the use of a protocol: a standard procedure for systematically describing and quantifying a voice difficulty. Often, specific time is set aside during an assessment appointment with the therapist for this aspect of evaluation, i.e. it is often presented as a separate activity from any informal conversation and case history taking.

There is no universally agreed upon method of conducting a formal perceptual evaluation. In fact, there are many schemes/protocols to choose from, each with its own strengths and weaknesses. A popular scheme is the Buffalo III Voice Profile developed in 1987. This scheme rates: laryngeal tone, pitch, loudness, nasal resonance, oral resonance, breath supply, muscles, voice abuse, rate, speech anxiety, speech intelligibility and an overall voice rating. Each parameter is quantified using a 5-point scale, where 1 = normal, 2 = mild, 3 = moderate, 4 = severe, and 5 = very severe.

Another popular scheme, which has been adopted by the UK Royal College of Speech and Language Therapists as the minimum knowledge and skills set for therapists working with voice difficulties, is the GRBAS Scale. Developed in 1981, this scheme is not a complete perceptual evaluation protocol but specifically evaluates voice quality. It assesses: Grade (the overall degree of voice abnormality), Roughness, Breathiness, Asthenia (voice weakness), and Strain. Under this scheme, each parameter is quantified on a 4-point scale, where 0 = normal, 1 = mild, 2= moderate, and 3 = severe.

There are several other similar schemes but I will exemplify the approach by referring to a scheme known as CVE2 which I originally developed in 2003.

Clinical Voice Evaluation 2 (CVE2)

CVE2 was originally written as a Windows® based software program to assist in the perceptual assessment of voice difficulties in both adults and children, guiding the clinician through a systematic voice evaluation. Two types of assessment can be conducted. The first is a detailed assessment that gathers data in relation to: client perception, contextual speech, voice quality, S/Z ratio, MPT (maximum phonation time), endurance, breathing, pitch, loudness, prosody, cough, coup, glottal attack, resonance, motor speech, and musculoskeletal tension. The second type of assessment is a screening assessment that takes approximately 10 minutes to administer: the data gathered is a subset of that collected for the detailed assessment. Screening assessments are used to determine whether or not the voice is sufficiently different from the norm so as to warrant a complete, in-depth assessment. I will now describe the screening assessment protocol.

Other than the aerodynamic measures and related observations, all screening parameters described below are rated on a 4-point scale: 0 = normal, 1 = mild, 2= moderate, and 3 = severe.

Download CVE2 Screening Assessment

Client perception

Ask the client to rate their voice today, right now, as they are speaking to you.


During informal conversation first determine if the voice quality is (1) normal (1) breathy, (2) hoarse, (3), husky or (4) whispered. Then, if the quality is other than normal, rate the particular quality as either mild, moderate or severe.


Again during informal conversation, determine if the pitch is (1) normal, (2) too high, (3) too low. If the pitch is other than normal, rate the too high or too low pitch as either mild, moderate or severe.

Pitch range

Ask the client to take a normal breath and, starting from a relatively low note, sing up a musical scale using the sound ‘lah’ as high as they can comfortably reach without straining. Now ask them to reverse this procedure, singing down the scale, starting from a relatively high note. Judge whether of not the pitch range is restricted and rate it on the 4-point scale.


Ask the client to take a normal breath and to count from 1 to 10, starting fairly softly and increasing the loudness with each number. Then ask them to reverse this procedure, counting backwards from 10 to 1, starting loudly and reducing the loudness with each descending number. From the clients performance on this task and their performance during informal conversation judge whether or not the loudness of their voice is (1) normal, (2) too loud, or (3) too quiet. As usual, if it is other than normal rate the too loud or too quiet voice as mild, moderate or severe.

Nasal resonance

Have the client read aloud the Standard Text.

Download a copy of the Standard Text (US Version)
Download a copy of the Standard Text (UK Version)

The first paragraph contains no nasal consonants and helps in detecting hypernasality. This is because, as all the consonants are oral consonants, the soft palate should normally be raised preventing air escaping through the nose. If there is genuine hypernasality this will be detectable because, as the soft palate does not fully seal off the nasal cavity, the oral consonants will be nasalized.

The second paragraph does contain nasal consonants (‘m’, ‘n’ and ‘ng’) and so it helps in detecting hyponasality. This is the reverse of the above argument, i.e. we would expect the nasal consonants to be nasalized but if the soft palate is not lowered sufficiently (or there is a blockage in the nasal cavity) then insufficient air escapes through the nose and this is particularly noticeable on the nasal consonants.

Having determined if the nasal resonance is (1) normal, (2) hypernasal, or (3) hyponasal, once again rate it as mild, moderate or severe if it is other than normal.

Oral resonance

This is sometimes known as throatiness. The voice seems to be focused too deep in the throat – what some people call guttural speech. Typically the tongue is held quite flat in the mouth and it is retracted towards the back of the throat. Again, if this is not normal, rate it as a mild, moderate or severe deviation.


An average rate of speech is around 120-150 syllables per minute. Each of the paragraphs in the Standard Text contains exactly 150 syllables. So this can be used to judge the rate of speech, i.e. a person speaking at 150 syllables per minute would take one minute to speak each paragraph. Be careful here, however, as people will often read at a different rate to their everyday spontaneous speech. You will be able, once more, to assist your judgment of this parameter from the informal conversation(s) you have had with the client. Determine if the rate is (1) normal, (2) too fast, or (3) too slow, and quantify this as appropriate.


Rhythm and intonation give spontaneous speech a characteristic rhythmical beat and an attendant musical quality. A flexible voice should possess good ‘coloring’ that makes it easy to listen to. Lack of ‘coloring’ leads to a monotonous voice. From your informal conversation(s) and the various assessment tasks performed so far, judge whether or not the voice is (1) normal, (2) has inadequate variability, or (3) has excessive variability, and rate as appropriate.

Remember that parameters such as prosody are culturally bound. That is to say, what is considered to be suitable prosody for a typist (who does not have to answer phones) may not be considered suitable for a professional voice user, such as a teacher, a minister of religion or an actor. Thus, one should judge this (and arguably all other parameters) in relation to the person’s vocal demands, occupation, any hobbies that rely on use of the voice, and so on.

Aerodynamic measures

  • S/Z ratio: This ratio is explained elsewhere (click here for an explanation of the S/Z Ratio). However, in summary, it is an indicator of laryngeal dysfunction. It is obtained by dividing the longest time in seconds that the client can sustain the sound ‘s’ by the longest time in seconds that they can sustain the sound ‘z’. Clients who have difficulty phonating will likely have an S/Z ratio of greater than 1.4. The higher the ratio, the more difficulty the client is experiencing when phonating.
  • MPT: This measure is explained elsewhere (click here for an explanation of MPT). In summary, it is simply the longest time that a client can sustain a vowel sound at a comfortable pitch and loudness on a deep breath. Adult females should achieve between 15-25 seconds, whereas adult males exceed this at between 25-35 seconds.

Related observations

The following observations are not rated on the 4-point scale but simply recorded as either (1) present, or (2) absent.

  • glottal fry: this is characterized by a series of rapid low-pitched ‘pops’ and a creaky quality.
  • diplophonia: this is characterized by the perception of two simultaneous pitches in the voice – it may result from involvement of the false vocal folds during normal phonation.
  • phonation breaks: these are characterized by uncontrolled, short-duration cessations of vocal fold vibrations during speech, heard as short periods of no voice.
  • fluctuations in quality: the quality of the voice (normal, breathy, hoarse, husky, whispered) may not be stable – there may be wide fluctuations from one quality to another and back again.

The screening assessment data can be used to create a Voice Profile with a Severity Rating which is obtained by adding the scores for all the parameters, with the exception of the aerodynamic measures, related observations and client perception.

Download CVE2 Vocal Profile (Screen)

Vocal Profiles such as this are useful as a baseline measure and for monitoring changes over time. Remember, that this profile is just a screening assessment and if the client received a high severity rating then this would suggest that a more complete and detailed assessment should be carried out.