The present invention relates to digital audio signal processing, and more particularly to loudspeaker virtualization and cross-talk cancellation devices and methods.
Multi-channel audio inputs designed for multiple loudspeakers can be processed to drive a single pair of loudspeakers and/or headphones to provide a perceived sound field simulating that of the multiple loudspeakers. In addition to creation of such virtual speakers for surround sound effects, signal processing can also provide changes in perceived listening room size and shape by control of effects such as reverberation.
Multi-channel audio is an important feature of DVD players and home entertainment systems. It provides a more realistic sound experience than is possible with conventional stereophonic systems by roughly approximating the speaker configuration found in movie theaters. FIG. 14 illustrates an example of multi-channel audio processing known as “virtual surround” which consists of creating the illusion of a multi-channel speaker system using a conventional pair of loudspeakers. This technique makes use of transfer functions from virtual loudspeakers to a listener's ears; that is, transfer functions made from the head-related transfer function (HRTF) of the direct path and of all the reflections of the virtual listening environment. A room transfer function is largely unknown, but the actual HRTFs (which are functions of the angles between source direction and head direction) can be approximated by use of a library of measured HRTFs. For example, Gardner, Transaural 3-D Audio, MIT Media Laboratory Perceptual Computing Section Technical Report No. 342, Jul. 20, 1995, provides HRTFs for every 5 degrees (azimuthal).
FIG. 15 shows functional blocks of an implementation for the (real plus virtual) speaker arrangement of FIG. 14; this requires cross-talk cancellation for the real speakers as shown in the lower right of FIG. 15. Here cross-talk denotes the signal from the right speaker that is heard at the left ear and vice-versa. The basic solution to eliminate cross-talk was proposed in U.S. Pat. No. 3,236,949 and is explained as follows. Consider a listener facing two loudspeakers as shown in FIG. 13. Let X1(ejω) and X2(ejω) denote the (short-term) Fourier transforms of the analog signals which drive the left and right loudspeakers, respectively, and let Y1(ejω) and Y2(ejω) denote the Fourier transforms of the analog signals actually heard at the listener's left and right ears, respectively. Presuming a symmetrical speaker arrangement, the system can then be characterized by two HRTFs, H1(ejω) and H2(ejω), which respectively relate to the short and long paths from speaker to ear; that is, H1(ejω) is the transfer function from left speaker to left ear or right speaker to right ear, and H2(ejω) is the transfer function from left speaker to right ear and from right speaker to left ear. This situation can be described as a linear transformation from X1, X2 to Y1, Y2 with a 2×2 matrix having elements H1 and H2
      [                                        Y            1                                                            Y            2                                ]    =            [                                                  H              1                                                          H              2                                                                          H              2                                                          H              1                                          ]        ⁡          [                                                  X              1                                                                          X              2                                          ]      Note that the dependence of H1 and H2 on the angle that the speakers are offset from the facing direction of the listener has been omitted.
FIG. 16 shows a cross-talk cancellation system in which the input electrical signals (short-term Fourier transformed) E1(ejω), E2(ejω) are modified to give the signals X1, X2 which drive the loudspeakers. (Note that the input signals E1, E2 are the recorded signals, typically using either a pair of moderately-spaced omni-directional microphones or a pair of adjacent uni-directional microphones with an angle between the two microphone directions.) This conversion from E1, E2 into X1, X2 is also a linear transformation and can be represented by a 2×2 matrix. If the target is to reproduce signals E1, E2 at the listener's ears (so Y1=E1 and Y2=E2) and thereby cancel the effect of the cross-talk (due to H2 not being 0), then the 2×2 matrix should be the inverse of the 2×2 matrix having elements H1 and H2. That is, taking
                              [                                                                      X                  1                                                                                                      X                  2                                                              ]                =                ⁢                                            [                                                                                          H                      1                                                                                                  H                      2                                                                                                                                  H                      2                                                                                                  H                      1                                                                                  ]                                      -              1                                ⁡                      [                                                                                E                    1                                                                                                                    E                    2                                                                        ]                                                  =                ⁢                                            1                                                H                  1                  2                                -                                  H                  2                  2                                                      ⁡                          [                                                                                          H                      1                                                                                                  -                                              H                        2                                                                                                                                                        -                                              H                        2                                                                                                                        H                      1                                                                                  ]                                ⁡                      [                                                                                E                    1                                                                                                                    E                    2                                                                        ]                              yields Y1=E1 and Y2=E2.
An efficient implementation of the cross-talk canceller diagonalizes the 2×2 matrix having elements H1 and H2
      [                                        H            1                                                H            2                                                            H            2                                                H            1                                ]    =                              1          2                ⁡                  [                                                    1                                            1                                                                    1                                                              -                  1                                                              ]                    ⁡              [                                                            M                0                                                    0                                                          0                                                      S                0                                                    ]              ⁡          [                                    1                                1                                                1                                              -              1                                          ]      where M0(ejω)=H1(ejω)+H2(ejω) and S0(ejω)=H1(ejω)−H2(ejω). Thus the inverse becomes simple to compute:
            [                                                  H              1                                                          H              2                                                                          H              2                                                          H              1                                          ]              -      1        =                              1          2                ⁡                  [                                                    1                                            1                                                                    1                                                              -                  1                                                              ]                    ⁡              [                                                            1                /                                  M                  0                                                                    0                                                          0                                                      1                /                                  S                  0                                                                    ]              ⁡          [                                    1                                1                                                1                                              -              1                                          ]      And the cross-talk cancellation is efficiently implemented as sum/difference detectors with the inverse filters 1/M0(ejω) and 1/S0(ejω). This structure is referred to as the “shuffler” cross-talk canceller. U.S. Pat. No. 5,333,200 discloses this plus various other cross-talk signal processing.
Now with cross-talk cancellation, the FIG. 14 virtual plus real loudspeaker arrangement can be simply created by use of the HRTFs for the offset angles of the speakers. In particular, let H1(θ) and H2(θ) denote the two HRTFs for a speaker offset by angle θ (or 360−θ by symmetry) from the facing direction of the listener. If the (short-term Fourier transform) of the speaker signal is denoted SS, then the corresponding left and right ear signals E1 and E2 would be H1(θ)·SS and H2(θ)·SS, respectively. These ear signals would be used as previously described for inputs to the cross-talk canceller; the cross-talk canceller outputs then drive the two real speakers to simulate a speaker at an angle θ and driven by source SS.
For example, the left surround sound virtual speaker could be at an azimuthal angle of about 250 degrees. Thus with cross-talk cancellation, the corresponding two real speaker inputs to create the virtual left surround sound speaker would be:
      [                                        X            1                                                            X            2                                ]    =                    1                              H            1            2                    -                      H            2            2                              ⁡              [                                                            H                1                                                                    -                                  H                  2                                                                                                        -                                  H                  2                                                                                    H                1                                                    ]              ⁡          [                                                                  TF3                left                            ·              LSS                                                                                          TF3                right                            ·              LSS                                          ]      where H1, H2 are for the left and right real speaker angles (e.g., 30 and 330 degrees), LSS is the (short-time Fourier transform of the) left surround sound signal, and TF3left=H1(250), TF3right=H2(250) are the HRTFs for the left surround sound speaker angle (250 degrees).
Again, FIG. 15 shows functional blocks for a virtualizer with the cross-talk canceller to implement 5-channel audio with two real speakers as in FIG. 14; each speaker signal is filtered by the corresponding pair of HRTFs for the speaker's offset angle and distance, and the filtered signals summed and input into the cross-talk canceller and then into the two real speakers.
Unfortunately, the transfer functions from the speakers to the ears depend upon the individual's head-related transfer functions (HRTFs) as well as room effects and therefore are not completely known. Instead generalized HRTFs are used to approximate the correct transfer function. Usually generalized HRTFs are able to create a sweet-spot for most listeners, especially when the room is fairly non-reverberant and diffuse.
However, the sweet spot can be quite a small region. That is, to perceive the virtualized sound field properly, a listener's head cannot move much from the central location used for the filter design with HRTFs and cross-talk cancellation. Thus there is a problem of small sweet spot with the current virtualization filter design methods.