SLTinfo logo

Speech Perception

How the ear functions

The human ear

Figure 1. A cutaway diagram of the human ear

The outer ear

Vibrations in the air are channeled by the structure of the external ear into the ear canal (Figure 1).

The middle ear

In the middle ear, the vibrating air encounters a taut membrane or eardrum stretched across the ear canal. The vibrations in the air set up sympathetic vibrations in the eardrum.

Inside the middle ear is a set of three tiny bones, the auditory ossicles. The auditory ossicles in their turn are caused to vibrate by the vibrations of the eardrum. The inner end of the auditory ossicles abuts a fluid-filled coiled structure called the cochlea.

The inner ear

Vibration of the ossicles is transferred into the fluid of the cochlea and particularly into a thin membrane that runs along its length called the basilar membrane.

Adjacent to the basilar membrane is a layer of small receptor cells, each with tiny cilia or hairs on it. Indeed, these receptors are known as hair cells. Movement of the basilar membrane causes movement of the hairs, which is converted into changes in electrical activity within the cell. Hair cells form connections with adjacent neurons, and thus electrical changes within them trigger neuronal action potentials (i.e. a momentary change in the energy of the electric charge on the surface of a cell). These neurons join into the auditory nerve, which relays the action potentials from the ear to the brain.

the cochlea

The pattern of vibration of the eardrum and ossicles is simply a reflection of the pattern of vibration in the air. In the cochlea, transformation of the signal begins. Because of the structure of the cochlea, high-frequency vibrations cause displacement of the basilar membrane at the outer end, and low-frequency vibrations cause displacement further along. Thus different hair cells at different positions will respond to different frequencies of sound, by virtue of being adjacent to different sections of the basilar membrane.

Thus, two crucial things have happened at the cochlear stage. First, a mapping of sounds of different frequencies onto different places on the basilar membrane has been set up. This is called tonotopic organisation. Second, any sound which consists of patterns of acoustic energy at several different frequencies will have been broken down into its component frequencies. This is because each formant (region of energy concentration) within the complex sound will cause vibration at a different position along the basilar membrane and hence cause different subsets of hair cells to respond. Action potentials generated in the neurons that connect to the hair cells are transmitted to the brain via the auditory nerve. What will be transmitted to the brain, then, already contains information about pitch (coded by which cells are firing), and a preliminary breakdown into formants.

The auditory cortex

The auditory nerve feeds into the brainstem, from where the auditory pathway ascends via a relay station in the middle of the forebrain to the auditory cortex of the superior temporal lobe on both sides of the brain (Figure 2). Neurons in the auditory cortex generally respond to information from the ear on the opposite side of the body.

primary auditory cortex

Figure 2. Diagram of the brain showing the primary auditory cortex

Representation of the signal in the primary auditory cortex is tonotopic. That is, cells at different locations respond to sounds at different frequencies, resulting in a mapping of the frequency spectrum of the sound across the surface of the brain (Figure 3). Recognition of sounds depends not on the absolute pitch of the formants but on their relationship to each other. We assume that this is processed in deeper layers of the auditory cortex, though exactly where or how is not yet fully understood.

Tonotopic organization within the auditory cortex

Figure 3. Tonotopic organisation within the primary auditory cortex

Specialized for speech

There is some evidence that within the primary auditory cortex, there are populations of neurons specialized for speech. This was shown by brain imaging experiments that compared patterns of activation in response to speech, scrambled cocktails of speech sounds, and non-speech sounds which were matched to the speech sounds on basic acoustic features (Moore, 2000). Areas in the superior temporal sulcus, on both sides of the brain, responded preferentially to real and scrambled speech rather than to the other sounds (Figure 4).

Cortical areas of the brain

Figure 4. Cortical areas that respond preferentially to speech or scrambled speech sounds, as opposed to non-speech sounds: (a) lateral view, (b) coronal section

Listening to speech produces activation on both sides of the brain in the auditory cortex. Damage to the superior temporal lobe on either side causes difficulties with speech recognition, though the pattern of the difficulties may be somewhat different on the two sides.


From the above discussion, it seems that the initial perception of speech is processed by some of the neurons in the auditory areas of the superior temporal lobe, on both sides of the brain.


Moore, D.R. (2000) ‘Auditory neuroscience: is speech special?’ Current Biology 10, 362-364.


[Information last accessed: 27 July 2017]

This article is adapted from ‘From sound to meaning: hearing, speech and language’. An OpenLearn ( chunk reworked by permission of The Open University copyright © 2016 – made available under the terms of the Creative Commons Licence v4.0 As such, it is also made available under the same licence agreement.