The present invention is related to the field of audio signal processing, and more specifically to processing channels of audio through filters to provide a perception of spatial dimension, including correctly locating a panned signal while listening using a binaural or transaural playback system.
FIG. 1 shows a common binaural playback system that includes processing multiple channels of audio by a plurality of Head Related Transfer Function (HRTF) filters, e.g., FIR filters, so as to provide a listener 20 with the impression that each of the input audio channels is being presented from a particular direction. FIG. 1 shows the processing of a number, denoted N, of audio sources consisting of a first audio channel 11 (Channel 1), a second audio channel (Channel 2), . . . , and an N'th audio channel 12 (Channel N) of information. The binaural playback system is for playback using a pair of headphones 19 worn by the listener 20. Each channel is processed by a pair of HRTF filters, one filter aimed for playback though the left ear 22 of the listener, the other played through the right ear 23 of the listener 20. So a first HRTF pair of filters 13, 14, up to an N'th pair of HRTF filters 15 and 16 are shown. The outputs of each HRTF filter meant for the left ear 22 of the listener 20 are added by an adder 18, and the outputs of each HRTF filter meant for playback through the right ear 23 of the listener 20 are added by an adder 17. The direction of incidence of each channel perceived by the listener 20 is determined by the choice of HRTF filter pair that is applied to that channel. For example, in FIG. 1, Audio Channel 1 (11) is processed through a pair of filters 13, 14, so that the listener is presented with audio input via headphones 19 that will give the listener the impression that the sound of Audio Channel 1 (11) is incident to the listener from a particular arrival azimuth angle denoted θ1, e.g., from a location 21. Similarly, the HRTF filter pair for the second audio channel is designed such that the sound of Audio Channel 2 is incident to the listener from a particular arrival azimuth angle denoted θ2, . . . , and the HRTF filter pair for N'th audio channel is designed such that the sound of Audio Channel N (12) is incident to the listener from a particular arrival azimuth angle denoted θN.
For simplicity, FIG. 1 shows only the azimuth angles of arrival, e.g., the angle of arrival of the perceived sound corresponding to Channel 1 from a perceived source 21. In general, HRTF filters may be used to provide the listener 20 with stimulus corresponding to any arrival direction, specified by both an azimuth angle of incidence and an elevation angle of incidence.
By a HRTF filter pair is meant the set of two separate HRTF filters required to process a single channel for the two ears 22, 23 of the listener, one HRTF filter per ear. Therefore, for two channel sound, two HRTF filters pairs are used.
The description herein is provided in detail primarily for a two-input-channel, i.e., stereo input pair system. Extending the aspects described herein to three or more input channels is straightforward, and therefore such extending is regarded as being within the scope of the invention.
FIG. 2 shows a stereo binauralizer system that includes two audio inputs, a left channel input 31 and a right channel input 32. Each of the two audio channel inputs are separately processed, with the left channel input being processed through one HRTF pair 33,34, and the right channel input being processed through a different HRTF pair 35, 36. In a typical situation, the left channel input 31 and the right channel input 32 are meant for symmetric playback, such that the aim of binauralizing using the two HRTF pairs is to give the perception to the listener of hearing the left and right channels from respective left and right angular locations that are symmetrically positioned relative to the medial plane of the listener 20. Referring to FIG. 2, if the HRTF pairs 33, 34, 35, 36 are for symmetrical listening, the left channel is perceived from source 37 at an azimuth angle θ and the right channel is perceived to be from a source 38 at an azimuth angle that is the negative of the azimuth angle of the right perceived source 37, i.e., from an azimuth angle−74 .
Under conditions of such symmetry, some simplifying assumptions are made. The first is that the listener's head and sound perception is symmetric. That means that:HRTF(θ,L)=HRTF(−θ,R)   (1)
Further, the HRTF from the left source 37 to the left ear 22 is equal to the HRTF from the right source 38 to the right ear 23. Denote such an HRTF as HRTFnear. Similarly, under such symmetrical assumptions, the HRTF from the left source 37 to the right ear 23 is equal to the HRTF from the right source 38 to the left ear 22. Denote such a HRTF as HRTFfar.
In binauralizers, the HRTF filters are typically found by measuring the actual HRTF response of a dummy head, or a human listener's head. Relatively sophisticated binaural processing systems make use of extensive libraries of HRTF measurements, corresponding to multiple listeners and/or multiple sound incident azimuth and elevation angles.
It is common, for a binaural system in use today, to simply use the measured θ and −θ HRTF pairs in a binaural processing system such as that of FIG. 2. In other words, making the assumption that measured HRTFs pairs are symmetrical,HRTFnear=HRTF(θ,L)HRTFfar=HRTF(θ,R)   (2)
Even if it is found by measurement that the listener head responses on which the HRTF pair is measured are not symmetric, such that Eq. 1 does not hold, a binauralizer such as that of FIG. 2 can be forced to be symmetrical by using HRTF filter pairs formed by averaging measured HRTFs. That is, for symmetrically listening to left and right that appear to be from sound sources, called “virtual sound sources,” also called “virtual speakers” that are at azimuth angles of θ and −θ, the filters for binaural processing are set as:
                                          HRTF            near                    =                                                    HRTF                ⁡                                  (                                      θ                    ,                    L                                    )                                            +                              HRTF                ⁡                                  (                                                            -                      θ                                        ,                    R                                    )                                                      2                          ⁢                                  ⁢                                            HRTF              far                        =                                                            HRTF                  ⁡                                      (                                          θ                      ,                      R                                        )                                                  +                                  HRTF                  ⁡                                      (                                                                  -                        θ                                            ,                      L                                        )                                                              2                                ,                                    (        3        )            
where HRTF(θ,L) and HRTF(θ,R) are the measured HRTF's for to the left and right angle, respectively, for a perceived source at angle θ. Therefore, by the near and far HRTFs are meant the actual measured or assumed HRTFs for the symmetric case, or the average HRTF's for the non-symmetric case.
Broadly (and roughly) speaking, such a binauralizer simulates the way a normal stereo speaker system works, by presenting the left audio input signal though an HRTF pair corresponding to a virtual left speaker, e.g., 37 and the right audio input signal though an HRTF pair corresponding to a virtual right speaker, e.g., 38. This is known to work well for providing the listener with the sensation that sounds, left and right channel inputs, are emanating from left and right virtual speaker locations, respectively.
In sound reproductions, e.g., through actual stereo speakers, it often is also desired to provide the listener with the sensation not only of left and right audio input sources 31 and 32 appearing to be from the speakers correctly placed to the left and right of the listener, but also from one or more sound sources that are between such left and right speaker locations. Suppose that there is a sound component that is elsewhere, e.g., elsewhere in front of the listener. As an example, suppose there is a sound source that is in the center between the assumed locations of left and right input audio channels. It is common, for example, in modern stereo recordings, for an audio signal to be fed with equal albeit attenuated amplitude to the left and right channels, so that when such left and right channel inputs are played back on stereo speakers in front of the listener, the listener is given the impression that the sound source is emanating from a source, called a “phantom speaker” located centrally between the left and right speakers. The term “phantom” is used for such a speaker because there is no actual speaker there. This is often referred to as a “phantom center,” and the process of producing the sensation of a sound coming from the center is called “creating the center image.”
Similarly, by proportionally feeding different amounts of a signal to the left and right channel inputs, the sensation of a sound emanating from elsewhere between the left and right speaker locations is provided to the listener.
To so create a stereo pair by diving an input between the left and right channel is called “panning;” equally dividing the signal is called “center panning.”
It is desired to provide the same sensation, that is, creating the center image, in a binauralizer system for playback though a set of headphones.
Consider, for example, an audio input signal called MonoInput center panned, e.g., split between the two channel inputs. For example, suppose two signals :LeftAudio and RightAudio are created as:
                              LeftAudio          =                      MonoInput            2                          ⁢                                  ⁢                  RightAudio          =                      MonoInput            2                                              (        4        )            
The results of a so center panned signal for stereo speaker reproduction is meant to be perceived as a signal emanating from the front center.
If the inputs LeftAudio and RightAudio of Eq. 4 are input to the binauralizer of FIG. 2, the left ear 22 and right ear 23 are fed signals, denoted LeftEar and RightEar, respectively, with:LeftEar=HRTFnear{circumflex over (×)}LeftAudio+HRTFfar{circumflex over (×)}RightAudioRightEar=HRTFnear{circumflex over (×)}RightAudio+HRTFfar{circumflex over (×)}LeftAudio′  (5)
where {circumflex over (×)} denotes the filtering operation, e.g., in the case that HRTFnear is expressed as an impulse response, and LeftAudio as a time domain input, HRTFnear{circumflex over (×)}LeftAudio denotes convolution. So, by combining the equations above,
                                                                        LeftEar                =                                                                            HRTF                      near                                        ⊗                                          MonoInput                      2                                                        +                                                            HRTF                      far                                        ⊗                                          MonoInput                      2                                                                                                                                              =                                                                                                    HRTF                        near                                            +                                              HRTF                        far                                                              2                                    ⊗                  MonoInput                                                                    ⁢                                  ⁢                                                            RightEar                =                                                                            HRTF                      near                                        ⊗                                          MonoInput                      2                                                        +                                                            HRTF                      far                                        ⊗                                          MonoInput                      2                                                                                                                                              =                                                                                                    HRTF                        near                                            +                                              HRTF                        far                                                              2                                    ⊗                  MonoInput                                                                                        (        6        )            
It is desired that such a splitting of an input would present the sensation of listening at a virtual speaker position of 0°, that is, the left and right ears are presented with a stimulus that corresponds to a 0° HRTF pair. In practice, this does not happen, so that a listener does not perceive the signal MonoInput to be from a virtual speaker centrally located between the virtual left and right speakers 37 and 38. Similarly, unequally splitting a signal between the left and right channel inputs and then binauralizing through a binauralizer such as shown in FIG. 2 fails to correctly create the illusion of the desired virtual location of the source between the virtual left and right speakers.
There thus is a need in the art for a binauralizer and binauralizing system that creates the illusion to a listener of a sound emanating from a location between the left and right virtual speaker locations of a binauralizer system, where by the left and right virtual speaker locations are meant the locations assumed for a left channel input and right channel input.
A signal that is meant to appear to come from the center rear, e.g., by splitting a mono signal into the left rear and right rear channel inputs, typically will not be perceived to come from the center rear when played back on headphones via a binauralizer that uses symmetric rear HRTF filters aimed at placing the rear speakers at symmetric rear virtual speaker locations.
There thus is a need in the art also for a binauralizer and binauralizing system that creates the illusion to a listener of a sound emanating from the rear center location for rear speaker signals, e.g., surround sound signals of a four or five channel system created by center panning a signal between the left and right virtual rear (surround) speakers.