The processing of binaural (two channel or stereo) audio signals to produce highly realistic 3D sound images is well known, and is described, for example, in International Patent Application No. WO94/22278. Binaural technology is based on recordings made using a so-called “artificial head” microphone system, and the recordings are subsequently processed digitally. The use of the artificial head ensures that the natural 3D sound cues—which the brain uses to determine the position of sound sources in 3D space—are incorporated into the stereo recordings.
The 3D sound cues are introduced naturally by the head and ears when we listen to sounds in real life, and they include the following characteristics: inter-aural amplitude difference (LAD), inter-aural time difference (ITD) and spectral shaping by the outer ear. To set the position of a virtual sound source, separate audio filters for the left and right channels of the audio signal add these characteristics, depending on the desired position of the sound. The characteristics themselves are determined by measurement of the head-related transfer function (HRTF). The HRTF characterises the modifications which an audio signal undergoes on its path from a point in space, at a defined direction and distance from a listener, to the eardrums of the listener.
When a pair of audio signals incorporating such 3D sound cues are introduced efficiently into the ears of the listener, by headphones say, then he or she perceives a virtual sound source to be located at the associated position in 3D space. However, if the processed signals are not conveyed directly and efficiently into the ears of the listener, then the full 3D effects will not be perceived. For example, when listening to sounds via conventional stereo loudspeakers, the left ear hears a little of the right loudspeaker signal, and vice versa—this is known as transaural crosstalk. By cancelling out transaural crosstalk, full 3D effects can be enjoyed via loudspeakers remote from the listener. Transaural crosstalk from each of the loudspeakers may be cancelled by creating appropriate crosstalk cancellation signals from the opposite loudspeaker. Crosstalk cancellation signals are equal in magnitude and inverted (opposite in polarity) with respect to the transaural crosstalk signals.
The acoustic effects of transaural crosstalk may be illustrated by means of a practical example illustrated by FIG. 1. Suppose that a sound recording is made using a pair of microphones spaced one head-width (approximately 15 cm) apart. A sound source 16 is now placed immediately to the left (azimuth −90°) of the microphone configuration. When the sound source 16 emits a sound impulse, the impulse arrives at the left-hand microphone first, and so it is recorded by the left-hand microphone before it is recorded by the right-hand microphone. The relative time-of-arrival delay for the sound impulse, tw, reaching the right-hand microphone is approximately 437 μs, and is equal to the separation distance (15 cm) divided by the speed of sound in air (approximately 343 ms−1). In practice, although the ears are separated by one head-width, the sound waves have to diffract around the circumference of the head, and therefore the effective path length is greater; it can be approximated by the expression:
                    (                  θ          360                )            ⁢      2      ⁢      π      ⁢                          ⁢      r        +                  r        ·        sin            ⁢                          ⁢      θ        ,where r is the radius of the head, and θ is the azimuth angle of the sound source.
Suppose, now, that this recording is being replayed on a two-speaker audio system, and that a listener 10 is sitting in the position shown in FIG. 1. Under these circumstances, with the speakers 12 and 14 located at angles of about ±30° with respect to the listener, the inter-aural time difference between signals arriving at the left and right ears, te, will be approximately 250 μs. When the recording of the impulse is replayed, it is emitted first from the left loudspeaker 12, followed by the right-hand loudspeaker 14 after the recorded delay of 437 μs.
Referring to FIG. 1, first the left ear hears the primary sound W from the left-hand loudspeaker 12, but then the crosstalk X from the left speaker arrives at the right ear only 250 μs (te) afterwards. Because this crosstalk signal derives from the same, real sound source, the brain receives a pair of highly correlated left and right sound signals, which it immediately uses to determine where the recorded sound source is apparently located. The brain therefore receives an ITD of only 250 μs (instead of 437 μs), which corresponds to the actual position of the left-hand loudspeaker at −30° azimuth. Consequently, the brain incorrectly localizes the sound source at −30°, rather than its correct location of −90° azimuth. The transaural crosstalk has, in effect, disabled the time-domain information which was built into the recording.
If transaural crosstalk cancellation is carried out correctly, and high quality HRTF source data is used, then the effects on the listener can be quite remarkable. For example, it is possible to move a virtual sound source around the listener in a complete circle, beginning in front (0° azimuth), moving around the right-hand side of the listener (+90° azimuth), then behind the listener (±180° azimuth), and back around the left-hand side (−90° azimuth) to the front again. It is also possible to make the virtual sound source appear to move in a vertical circle around the listener, and indeed make the sound appear to come from any selected position in space.
However, some positions are more difficult to synthesise than others. For example, the effectiveness of moving a virtual sound source directly upwards or downwards is greater at the sides of the listener (±90° azimuth) than directly in front of the listener (0° azimuth). This is probably because there is more left-right difference information for the brain to work with. Similarly, it is difficult to differentiate between a sound source directly in front of the listener (0° azimuth), and a source directly behind the listener (±180° azimuth). This is because there is no time-domain information present for the brain to operate with (that is, the ITD=0), and the only other positional information available to the brain, spectral data, is similar in both of these positions.
In practice, there is more high frequency energy perceived when the sound source is in front of the listener. This is because the high frequencies from frontal sources are reflected into the auditory canal from the rear wall of the concha, whereas for a rearward source, high frequencies cannot diffract around the pinna sufficiently (FIG. 12).
One of the first practical crosstalk cancellation schemes was described in the US patent of Atal and Schroeder (U.S. Pat. No. 3,236,949), and more fully explained in Schroeder's 1975 publication “Models of Hearing” (Proc. IEEE, September 1975, 63 (9), pp. 1332–1350). A block diagram of this method is shown in FIG. 2.
Referring to FIG. 2, there are binaural sound sources 18 (left) and 20 (right), which are filtered by crossfeed filters 21 and 23 to generate loudspeaker driving signals 22 and 24 respectively. The filters 21 and 23 represent the combination of two basic functions: firstly, the transfer function, S, between a first loudspeaker of a pair of loudspeakers and the ear of a listener 10 which is closest to this loudspeaker; and secondly, a function, A, representing the transfer function from the same first loudspeaker to the far ear of the listener. If there were no transaural crosstalk present, the transfer function from the right sound source 20 to the right ear (and from the left source 18 to the left ear) would be simply S. The presence of transaural crosstalk, however, requires a cancellation signal to be provided by the other loudspeaker.
For example, consider the process of transferring the right channel signal 20 into the right ear only. The transfer from the right loudspeaker 14 to the right ear is via the “same-side” function S. The crosstalk from the right loudspeaker will arrive at the left ear with transfer function A. Consequently, we need to deliver a (−A) signal to the left ear from the left speaker 12 in order to cancel it. However, we know that the transfer function from the left speaker to the left ear is S, and so the overall crosstalk cancellation signal from the right to left channel must be (−A/S). This would deliver the correct crosstalk cancellation signal properly to the left ear. Thus, according to these observations, the crossfeed function, C, must be set equal to (−A/S). S and A can be established by direct measurement, ideally from an artificial head having physical features and dimensions of an average human head.
However, a perfect crosstalk cancellation system is only obtained when the head of a listener is totally immobile and fixed in the absolute centre of the preferred position (i.e., the “sweet spot”, where the ears are exactly coincident with the respective sound-wave cancellation nodes). The reason for this is that sound-wave cancellation effects are dependent on the precise coincidence of equal and opposite signals, and so when one wave is relatively displaced, then the wave cancellation is incomplete.
For example, if a listener's head were to move sideways such that the left ear was 5 cm closer to the left speaker (and 5 cm more distant from the right loudspeaker), then the unwanted primary signal to the left ear (from the right speaker) which must be cancelled, would be shifted relatively by 10 cm with respect to its intended cancellation wave from the left speaker. Thus the transaural crosstalk cancellation would be imperfect. As the frequency of the audio signal increases, this effect occurs for smaller relative lateral movements, because the nodes and anti-nodes become closer and closer.
U.S. Pat. No. 4,975,954 (Cooper and Bauck) discloses a particular transaural crosstalk cancellation scheme as shown in FIG. 3. The scheme features a pair of high frequency (HF) cut (>8 kHz) filters 26 and 28. In this method, the high frequency signals being fed to the crosstalk cancellation means are attenuated by low-pass filters 26 and 28 situated in the crossfeed filter path 8 from the left to the right channel (and vice versa). Consequently, it is claimed that imperfect crosstalk cancellation at high frequencies due to the movement of the head out of the preferred position would be reduced because such high frequencies are not being transaural crosstalk-cancelled.
However, this method is ineffective for rearward placement of virtual sound sources because the high frequency components in the source signals 18 and 20 are transmitted directly to the loudspeakers themselves, without crosstalk cancellation. Consequently, the perceived sources of the HF sounds are the loudspeakers themselves, rather than one or more virtual sound sources. As a result, the HF sounds appear to be detached from the virtual sound images, and create a frontal spatial distraction. When the virtual sound image is to be positioned in the front of the listener, the effect of this scheme is to smear out the spatial position of the sound image, but when the virtual sound image is to be positioned behind the listener, the effect inhibits and prevents the formation of a rearward image. Instead, the image becomes reflected in front of the listener.
In respect of other crosstalk cancellation schemes, such as that of Atal and Schroeder, in practical situations a listener's head cannot be guaranteed to remain in the preferred position, and if it moves from this preferred position, the transaural crosstalk cancellation will not be perfect. The effect of imperfect crosstalk cancellation at the higher frequencies is that they appear to originate from the loudspeakers themselves, and not from the required position in which the virtual sound source was placed using the HRTFs, as noted above. This makes locating a virtual sound image behind the listener much more difficult to achieve especially because, as stated previously, it is the higher frequency sound information which provides a frontal cue and enables a listener to distinguish between sounds placed in front and sounds placed behind.
It is worth noting at this stage that the creation of effective crosstalk cancellation is not so difficult as it might appear. This is because of the natural acoustic properties of the head and ears themselves. In essence, as the frequency of a signal increases, the head acts more and more effectively as a baffle, naturally suppressing crosstalk at high frequencies. Consequently, there is little crosstalk to cancel at high frequencies, and the method of Cooper and Bauck does not provide, in practice, a significant advantage over the Atal and Schroeder method.