In stereophonic sound systems, such as those found in home entertainment applications, there is an attempt to control the localization of sounds typically using balance potentiometers. In this process, the relative level between two loudspeakers affects where the phantom image will exist as perceived by a listener positioned equidistant from two loudspeakers with respect to a single plane. The perception of where the sound originates, i.e., the phantom image, has also been observed to be a function of the delay between the two otherwise identical sources. For gradual increasing delays, which are on the order of the Interaural Time Difference (ITD) between the ears, the phantom image will shift toward the real undelayed source, which is disposed away from the phantom image. As the amount of delay is increased toward 10 mS, sound direction is "fused" to the speaker from which the sound first arrived. In fact, it has been observed that if two similar sounds, which originate from separate sources are delayed with repsect to each other by an amount that is between 10 mS-50 mS, a listener who is positioned equidistant from the two loudspeakers will perceive the sound to be coming from the direction of the speaker whose sound arrives first, to the exclusion of the second speaker. This has been referred to as the Law of the First Wavefront, the Precedence Effect or the Haas Effect.
For sound arriving from two different sources, be they reflections or delayed sources, the sound can either appear as an echo to an individual, or as just a mere coloration of the direct sound. If the delay between two identical sounds is separated in time by around 10 mS, the sound will be perceived as a coloration of the direct sound, whereas for delays greater than around 50 mS, the sound will be perceived as an echo. Therefore, if the delayed sound were directed toward the listener from a rearward position with a delay between 10-50 mS relative to the direct sound, the listener would not perceive the location of the rearmost sound source, but, rather, he would experience a fuller and perhaps more intelligible sound at his location. Essentially, the human ear tends to lock on sound which arrives first.
The above observations can generally be explained based on the theory that the position of a sound source is cued by interaural differences in the intensity and time of arrival (phase). This is the so-called duplex theory of localization which states that phase is the main mechanism of the localization below 1500 Hz, while for frequencies above around 4000 Hz, intensity is the main localization cue. For the intervening range of frequencies, localization is not good and it may be that confusion comes about because of conflict between the two mechanisms over this range of frequencies. The duplex theory of localization will break down when it comes to defining unique sound source positions. A sound source which is located directly in front of a listener and one which is located directly behind a listener provides identical signals to the ears according to the duplex theory. However, it is a common everyday experience to discriminate between front and back localized sounds. There is much evidence to support the idea that a third mechanism contributes to the localization of sound, and that is the pinna transformation of sound.
Over the years, experiments have shown that the pinna performs a spectral modification which gives additional cues for the localization of sounds. This is particularly true with respect to elevation and front-back cues. The brain/nervous systems appears to process angular dependant spectral information in order to determine direction. This is due to the complex shape of the pinna which, when presented to a sound in front of the user, results in a significantly different response to the ear canal as compared to that for a sound originating from behind the listener. This spectral modification is also affected by the head and torso.
For multi-dimensional sound, typically referred to as 3-D sound, it is necessary to localize the sound, identify moving sound sources, enlarge the ideal listening area for the listener and remove the actual sound from a viewing area, such as a movie screen, to the individual. When considering only a single individual in a room, multi-dimensional sound has been reproduced through either headphones or through loudspeakers. With respect to the loudspeakers, it is important that the listener not move, since very complex systems have been developed which provide for cancellation of cross-talk between loudspeakers. Further, the rooms in which these experiments have been carried out typically are acoustically "dead" rooms.
One system that has been provided to reproduce binaural signals though loudspeakers is the Q-biphonic system. This system utilizes a binaural synthesizer that takes pre-recorded monaural sources and converts them into binaural signals along with loudspeaker cross-talk cancellation circuitry necessary for playback through loudspeakers. These systems claim to achieve full azimuthal localization in a four speaker system in addition to elevation localization. This system is very sensitive to head movement and is restricted to only one listening position. In the early days of this system, it was found that an anechoic space was needed.
Another solution proposed for a multi-dimensional system is one utilizing a multiple delay line system controlled by a personal computer. Provisions are made for six delay lines and an additional four non-delay lines. By utilizing a computer "mouse", which provides coordinate manipulation, sounds can be localized by controlling the signal arrival times between loudspeakers in a multiple speaker system. In addition to the adjustable delay, there is also an adjustable attenuation provided for each line. The individual delay times and attenuation calculations, which are accomplished on a computer, achieve the desired effect, i.e., phantom imaging. Delay times can be updated to account for moving sources through the use of the mouse, and preset configurations can be stored for future reference.
Some present research that is going on in the multi-dimensional sound system field is that for developing a multisensory "virtual environment" work station (VIEW) for use in space station teleoperation, tele-presence and automation activities. The auditory requirements for this project led to the prototyping of a binaural signal processor for converting generated or recorded sounds into binaural signals. Researchers measured a subject's pinna responses as a function of azimuth and elevation and arrived at pure head related transfer functions (HRTFs) using Fast Fourier Transform techniques. These HRTFs were implemented in a Digital Signal Processing (DSP) device which allowed the user to apply direction dependent equalization to an incoming signal. By establishing the proper relationship between the ITD, the Interaural Level Difference (ILD), and the HRTF, experimenters were able to synthesize free field stimuli and present this over headphones. Motion trajectories and static locations that represented greater resolution of HRTFs than measured were arrived at through interpolation. However, this system had some problems with front-back reversals.
To record binaural soundtracks, a recording system has been utilized that employs an artificial head for making the recordings. This is sometimes referred to as a "dummy" head. The system utilizes an artificial head that is fabricated from an anthropromorphic mannequin-like device that has lifelike pinnas and microphones disposed in the ear canals. The microphones are disposed on either side of the artificial head, and these microphones are utilized in conjunction with a binaural processor that converts the standard signals into binaural signals. The artificial head is typically utilized as an area microphone with additional circuitry provided for replicating the recordings of soloists which are converted and blended with the area recording.
In the recording process utilizing the artificial head, the head is equalized for a flat free-field response at frontal incidence. This accomplishes two things. First, the experience of listening to binaural recordings through headphones typically produces interior or "in-the-head" sounds. This is due to the disturbance of the conch resonance in the pinna by earphone cups, which causes a sense of nearness and "in the head" localization. The free-field equalization removes this resonance during recording, while for playback, the headphones are equalized to restore this resonance. It can be appreciated that the headphones destroy the natural conch resonance. The equalization of the response with the headphones results in better external localization, which is still imperfect because of the uniqueness of the transfer function of the pinna of each individual.
Secondly, the artificial head recordings made with the free-field equalization will reproduce with good results through regular stereo equipment. Furthermore, if these binaural recordings are reproduced through loudspeakers utilizing cross-talk cancelization (transaural listening), the conch resonance of the pinna is not presented twice, but is only restored by the natural action of the outer ear.
In U.S. Pat. No. 4,817,149, issued Mar. 28, 1989, a system is disclosed that enables sounds to be localized from all directions when played through headphones. Elevation and front/back cues are established utilizing direction-dependant filtering while horizontal (aximuthal) localization is achieved by control of interaural time differences.