Reference 2

Sound localization

Sound localization refers to a listener’s ability to identify the location or origin of a detected sound in direction and distance. It may also refer to the methods in acoustical engineering to simulate the placement of an auditory cue in a virtual 3D space (see binaural recording).

The sound localization mechanisms of the human auditory system have been extensively studied. The human auditory system uses several cues for sound source localization, including time- and level-differences between both ears, spectral information, timing analysis, correlation analysis, and pattern matching.

These cues are also used by animals, but there may be differences in usage, and there are also localization cues which are absent in the human auditory system, such as the effects of ear movements.

For determining the lateral input direction (left, front, right) the auditory system analyzes the following ear signal information:

  • Interaural time differences
    Sound from the right side reaches the right ear earlier than the left ear. The auditory system evaluates interaural time differences from

  • Interaural level differences
    Sound from the right side has a higher level at the right ear than at the left ear, because thehead shadows the left ear. These level differences are highly frequency dependent and they increase with increasing frequency.

For frequencies below 800 Hz, mainly interaural time differences are evaluated (phase delays), for frequencies above 1600 Hz mainly interaural level differences are evaluated. Between 800 Hz and 1600 Hz there is a transition zone, where both mechanisms play a role.

Evaluation for low frequencies

For frequencies below 800 Hz the dimensions of the head (ear distance 21.5 cm, corresponding to an interaural time delay of 625 µs), are smaller than the half wavelength of the sound waves. So the auditory system can determine phase delays between both ears very precisely. Interaural level difference are very low in this frequency range, so that a precise evaluation of the input direction is nearly impossible on the basis of level differences. As the frequency drops below 80 Hz it becomes difficult or impossible to use either time difference or level difference to determine a sound’s lateral source, because the phase difference between the ears becomes too small for a directional evaluation (i.e. the phase difference is great enough that the lagging wave sensed in the offside ear coincides with the next wave which is being sensed by the nearer ear).

Evaluation for high frequencies

For frequencies above 1600 Hz the dimensions of the head are greater than the length of the sound waves. An unambiguous determination of the input direction based on interaural phases is not possible at these frequencies. However, the interaural level differences become bigger, and these level differences are evaluated by the auditory system. Also, group delays between the ears can be evaluated; this is more pronounced at higher frequencies. This means, if there is a sound onset, the delay of this onset between both ears can be used to determine the input direction of the corresponding sound source. This mechanism becomes especially important in reverberant environment. After a sound onset there is a short time frame, where the direct sound reaches the ears, but not yet the reflected sound. The auditory system uses this short time frame for evaluating the sound source direction, and keeps this detected direction as long as reflections and reverberation prevent an unambiguous direction estimation.

The mechanisms described above cannot be used to differentiate between a sound source ahead of the hearer or behind the hearer; therefore additional cues have to be evaluated.

Sound localization in the median plane (front, above, back, below)

The human outer ear, i.e. the structures of the pinna and the external ear canal, form direction-selective filters. Depending on the sound input direction in the median plane, different filter resonances become active. These resonances implant direction-specific patterns into the frequency responses of the ears, which can be evaluated by the auditory system (directional bands). Together with other direction-selective reflections at the head, shoulders and torso, they form the outer ear transfer functions.

These patterns in the ear’s frequency responses are highly individual, depending on the shape and size of the outer ear. If sound is presented through headphones, and has been recorded via another head with different-shaped outer ear surfaces, the directional patterns differ from the listener’s own, and problems will appear when trying to evaluate directions in the median plane with these foreign ears. As a consequence, front–back permutations or inside-the-head-localization can appear when listening to dummy head recordings,or otherwise referred to as binaural recordings.

Distance of the sound source

The human auditory system has only limited possibilities to determine the distance of a sound source. In the close-up-range there are some indications for distance determination, such as extreme level differences (e.g. when whispering into one ear) or specific pinna resonances in the close-up range.

The auditory system uses these clues to estimate the distance to a sound source:

  • Sound spectrum : High frequencies are more quickly damped by the air than low frequencies. Therefore a distant sound source sounds more muffled than a close one, because the high frequencies are attenuated. For sound with a known spectrum (e.g. speech) the distance can be estimated roughly with the help of the perceived sound.

  • Loudness: Distant sound sources have a lower loudness than close ones. This aspect can be evaluated especially for well-known sound sources (e.g. known speakers).

  • Movement: Similar to the visual system there is also the phenomenon of motion parallax in acoustical perception. For a moving listener nearby sound sources are passing faster than distant sound sources.

  • Reflections: In enclosed rooms two types of sound are arriving at a listener: The direct sound arrives at the listener’s ears without being reflected at a wall. Reflected sound has been reflected at least one time at a wall before arriving at the listener. The ratio between direct sound and reflected sound can give an indication about the distance of the sound source.

Signal processing

Sound processing of the human auditory system is performed in so-called critical bands. Thehearing range is segmented into 24 critical bands, each with a width of 1 Bark or 100 Mel. For a directional analysis the signals inside the critical band are analyzed together.

The auditory system can extract the sound of a desired sound source out of interfering noise. So the auditory system can concentrate on only one speaker if other speakers are also talking (the cocktail party effect). With the help of the cocktail party effect sound from interfering directions is perceived attenuated compared to the sound from the desired direction. The auditory system can increase thesignal-to-noise ratio by up to 15 dB, which means that interfering sound is perceived to be attenuated to half (or less) of its actual loudness.

Localization in enclosed rooms

In enclosed rooms not only the direct sound from a sound source is arriving at the listener’s ears, but also sound which has been reflected at the walls. The auditory system analyses only the direct sound[citation needed], which is arriving first, for sound localization, but not the reflected sound, which is arriving later (law of the first wave front). So sound localization remains possible even in an echoic environment. This echo cancellation occurs in the Dorsal Nucleus of the Lateral Lemniscus(DNLL).

In order to determine the time periods, where the direct sound prevails and which can be used for directional evaluation, the auditory system analyzes loudness changes in different critical bands and also the stability of the perceived direction. If there is a strong attack of the loudness in several critical bands and if the perceived direction is stable, this attack is in all probability caused by the direct sound of a sound source, which is entering newly or which is changing its signal characteristics. This short time period is used by the auditory system for directional and loudness analysis of this sound. When reflections arrive a little bit later, they do not enhance the loudness inside the critical bands in such a strong way, but the directional cues become unstable, because there is a mix of sound of several reflection directions. As a result no new directional analysis is triggered by the auditory system.

This first detected direction from the direct sound is taken as the found sound source direction, until other strong loudness attacks, combined with stable directional information, indicate that a new directional analysis is possible. (see Franssen effect)