The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
Virtual rendering of spatial audio over a pair of speakers commonly involves the creation of a stereo binaural signal, which is then fed through a cross-talk canceller to generate left and right speaker signals. The binaural signal represents the desired sound arriving at the listener's left and right ears and is synthesized to simulate a particular audio scene in three-dimensional (3D) space, containing possibly a multitude of sources at different locations. The crosstalk canceller attempts to eliminate or reduce the natural crosstalk inherent in stereo loudspeaker playback so that the left channel of the binaural signal is delivered substantially to the left ear only of the listener and the right channel to the right ear only, thereby preserving the intention of the binaural signal. Through such rendering, audio objects are placed “virtually” in 3D space since a loudspeaker is not necessarily physically located at the point from which a rendered sound appears to emanate.
The design of the cross-talk canceller is based on a model of audio transmission from the speakers to a listener's ears. FIG. 1 illustrates a model of audio transmission for a cross-talk canceller system, as presently known. Signals sL and sR represent the signals sent from the left and right speakers 104 and 106, and signals eL and eR represent the signals arriving at the left and right ears of the listener 102. Each ear signal is modeled as the sum of the left and right speaker signals, and each speaker signal is filtered by a separate linear time-invariant transfer function H modeling the acoustic transmission from each speaker to that ear. These four transfer functions 108 are usually modeled using head related transfer functions (HRTFs) selected as a function of an assumed speaker placement with respect to the listener 102. In general, an HRTF is a response that characterizes how an ear receives a sound from a point in space; a pair of HRTFs for two ears can be used to synthesize a binaural sound that seems to emanate from a particular point in space.
The model depicted in FIG. 1 can be written in matrix equation form as follows:
                              [                                                                      e                  L                                                                                                      e                  R                                                              ]                =                                                            [                                                                                                    H                        LL                                                                                                            H                        RL                                                                                                                                                H                        LR                                                                                                            H                        RR                                                                                            ]                            ⁡                              [                                                                                                    s                        L                                                                                                                                                s                        R                                                                                            ]                                      ⁢                                                  ⁢            or            ⁢                                                  ⁢            e                    =          Hs                                    (        1        )            
Equation 1 reflects the relationship between signals at one particular frequency and is meant to apply to the entire frequency range of interest, and the same applies to all subsequent related equations. A crosstalk canceller matrix C may be realized by inverting the matrix H, as shown in Equation 2:
                    C        =                              H                          -              1                                =                                    1                                                                    H                    LL                                    ⁢                                      H                    RR                                                  -                                                      H                    LR                                    ⁢                                      H                    RL                                                                        ⁡                          [                                                                                          H                      RR                                                                                                  -                                              H                        RL                                                                                                                                                        -                                              H                        LR                                                                                                                        H                      LL                                                                                  ]                                                          (        2        )            
Given left and right binaural signals bL and bR, the speaker signals sL and sR are computed as the binaural signals multiplied by the crosstalk canceller matrix:
                    s        =                              Cb            ⁢                                                  ⁢            where            ⁢                                                  ⁢            b                    =                      [                                                                                b                    L                                                                                                                    b                    R                                                                        ]                                              (        3        )            
Substituting Equation 3 into Equation 1 and noting that C=H−1 yields:e=HCb=b   (4)
In other words, generating speaker signals by applying the crosstalk canceller to the binaural signal yields signals at the ears of the listener equal to the binaural signal. This assumes that the matrix H perfectly models the physical acoustic transmission of audio from the speakers to the listener's ears. In reality, this will likely not be the case, and therefore Equation 4 will generally be approximated. In practice, however, this approximation is usually close enough that a listener will substantially perceive the spatial impression intended by the binaural signal b.
The binaural signal b is often synthesized from a monaural audio object signal o through the application of binaural rendering filters BL and BR:
                              [                                                                      b                  L                                                                                                      b                  R                                                              ]                =                                            [                                                                                          B                      L                                                                                                                                  B                      R                                                                                  ]                        ⁢            o            ⁢                                                  ⁢            or            ⁢                                                  ⁢            b                    =          Bo                                    (        5        )            
The rendering filter pair B is most often given by a pair of HRTFs chosen to impart the impression of the object signal o emanating from an associated position in space relative to the listener. In equation form, this relationship may be represented as:B=HRTF{pos(o)}  (6)
In Equation 6 above, pos(o) represents the desired position of object signal o in 3D space relative to the listener. This position may be represented in Cartesian (x,y,z) coordinates or any other equivalent coordinate system such a polar system. This position might also be varying in time in order to simulate movement of the object through space. The function HRTF{ } is meant to represent a set of HRTFs addressable by position. Many such sets measured from human subjects in a laboratory exist, such as the CIPIC database, which is a public-domain database of high-spatial-resolution HRTF measurements for a number of different subjects. Alternatively, the set might be comprised of a parametric model such as the spherical head model. In a practical implementation, the HRTFs used for constructing the crosstalk canceller are often chosen from the same set used to generate the binaural signal, though this is not a requirement.
In many applications, a multitude of objects at various positions in space are simultaneously rendered. In such a case, the binaural signal is given by a sum of object signals with their associated HRTFs applied:
                    b        =                                            ∑                              i                =                1                            N                        ⁢                                                  ⁢                                          B                i                            ⁢                              o                i                            ⁢                                                          ⁢              where              ⁢                                                          ⁢                              B                i                                              =                      HRTF            ⁢                          {                              pos                ⁡                                  (                                      o                    i                                    )                                            }                                                          (        7        )            
With this multi-object binaural signal, the entire rendering chain to generate the speaker signals is given by:
                    s        =                  C          ⁢                                    ∑                              i                =                1                            N                        ⁢                                          B                i                            ⁢                              o                i                                                                        (        8        )            
In many applications, the object signals oi are given by the individual channels of a multichannel signal, such as a 5.1 signal comprised of left, center, right, left surround, and right surround. In this case, the HRTFs associated with each object may be chosen to correspond to the fixed speaker positions associated with each channel. In this way, a 5.1 surround system may be virtualized over a set of stereo loudspeakers. In other applications the objects may be sources allowed to move freely anywhere in 3D space. In the case of a next generation spatial audio format, the set of objects in Equation 8 may consist of both freely moving objects and fixed channels.
One disadvantage of a virtual spatial audio rendering processor is that the effect is highly dependent on the listener sitting in the optimal position with respect to the speakers that is assumed in the design of the crosstalk canceller. What is needed, therefore, is a virtual rendering system and process that maintains the spatial impression intended by the binaural signal even if a listener is not placed in the optimal listening location.