The present invention relates to audio systems, in particular, xe2x80x9c3Dxe2x80x9d audio systems.
Conventional 3D audio systems include: (i) a binaural spatializer, which simulates the appropriate auditory experience of one or more sources located around the listener; and (ii) a delivery system, which ensures that the binaural signals are received correctly at the listener""s ears. Much work has been done on binaural spatialization and several commercial systems are currently available.
To achieve good reproduction of 3D audio, it is necessary to precisely control the acoustic signals at the listener""s ears. One way to do this is to deliver the audio signals through headphones. In many situations, however, it is preferable not to wear headphones. The use of standard stereo loudspeakers is problematic, since there is a significant amount of left and right channel leakage known as xe2x80x9ccrosstalkxe2x80x9d.
Acoustic crosstalk cancellation is a signal processing technique whereby two (or possibly more) loudspeakers are used to deliver 3D audio to a listener, without requiring headphones. The idea is to cancel the crosstalk signal that arrives at each ear from the opposite-side loudspeaker. If this can be successfully achieved, then the acoustic signals at the listener""s ears can be controlled, just as if the listener was wearing headphones. A significant problem with existing crosstalk cancellation systems is that they are very sensitive to the position of the listener""s head. Although good cancellation can be achieved for the head in a default position, the crosstalk signal is no longer canceled if the listener moves his head; in some cases head movement of only a couple of centimeters can have drastic effects.
With conventional systems, exact cancellation requires perfect knowledge of the acoustic transfer functions (TFs) between the loudspeakers and the listener""s ears. These TFs are modeled using an assumed head position and generic head-related transfer functions (HRTFs). (See, for example, D. G. Begault, xe2x80x9c3D sound for virtual reality and multimedia,xe2x80x9d Academic Press Inc., Boston, 1994.) In practice, however, the real TFs will always differ from the assumed model, most noticeably by the listener""s head moving from its assumed position. Any variation between the assumed model and the real environment will result in degradation in the performance of the crosstalk canceler: in some cases this performance degradation can be quite severe.
The only way to know the acoustic TFs exactly is to place microphones in the listener""s ears and constantly update the crosstalk cancellation network appropriately. (See, e.g., P. A. Nelson et al., xe2x80x9cAdaptive inverse filters for stereophonic sound reproductionxe2x80x9d, IEEE Trans. Signal Processing, vol. 40, no. 7, pp. 1621-1632, July 1992.) However it may be preferable to use some form of passive head tracking and adaptively update the cancellation network based on the current position of the listener""s head. Methods of passive head tracking include: (i) using a head-mounted head tracker; (ii) using a microphone array to determine the head position based on the listener""s giving a spoken command (this may require the user to constantly speak to the system); or (iii) using a video camera. Although use of a video camera appears to be the most promising, even with an accurate camera-based head tracker, it is inevitable that there will still be some position errors in addition to errors between the generic HRTFs and the listener""s own HRTFs. For these reasons, such a crosstalk canceler will be non-robust in practice.
FIG. 1 is a generalized block diagram of a conventional crosstalk cancellation system as described in U.S. Pat. No. 3,236,949 to Atal and Schroeder. pL and pR are the left and right program signals respectively, l1 and l2, are the loudspeaker signals, and anR, n=1, 2 is the transfer function (TF) from the nth loudspeaker to the right ear (a similar pair of TFs for the left ear, denoted by anL, are not shown). The objective is to find the filter transfer functions h1, h2, h3, h4 such that: (i) the signals pL and pR are reproduced at the left and right ears respectively; and (ii) the crosstalk signals are canceled, i.e., none of the pL signal is received at the right ear, and similarly, none of the pR signal is received at the left ear.
Denoting the signals at the left and right ears as eL and eR respectively, the block diagram of FIG. 1 may be described by the following linear system:                                           [                                                                                e                    R                                                                                                                    e                    L                                                                        ]                    =                                                    [                                                                                                    a                        1                        R                                                                                                            a                        2                        R                                                                                                                                                a                        1                        L                                                                                                            a                        2                        L                                                                                            ]                            ⁡                              [                                                                                                    h                        1                                                                                                            h                        3                                                                                                                                                h                        2                                                                                                            h                        4                                                                                            ]                                      ⁡                          [                                                                                          p                      R                                                                                                                                  p                      L                                                                                  ]                                      ⁢                  
                ⁢                  e          =                      A            ⁢                          xe2x80x83                        ⁢            H            ⁢                          xe2x80x83                        ⁢                          p              .                                                          (        1        )            
To reproduce the program signals identically at the ears requires that
H=Axe2x88x921.xe2x80x83xe2x80x83(2)
For simplicity, only the response to the right program channel will be described. The description for the left channel would be similar. In this case, the block diagram in FIG. 1 reduces to a two-channel beamformer, with filters h1 and h2 on the respective channels.
Let the response at the ears be:                                           [                                                                                b                    R                                                                                                                    b                    L                                                                        ]                    =                                    [                                                                                          a                      1                      R                                                                                                  a                      2                      R                                                                                                                                  a                      1                      L                                                                                                  a                      2                      L                                                                                  ]                        ⁡                          [                                                                                          h                      1                                                                                                                                  h                      2                                                                                  ]                                      ⁢                  
                ⁢                              b            =            Ah                    ,                                    (        3        )            
where bR=1 (i.e., the right program signal is faithfully reproduced at the right ear), and bL=0 (i.e., none of the right program signal reaches the left ear). Assuming the TF matrix A is known and invertible, then the system of equations (3) can be readily solved to find the required filters h. Typically, the TF matrix A is determined (either from measurements on a dummy head, or through calculations using some assumed head model) for a fixed head location (the xe2x80x9cdesign positionxe2x80x9d). However, if A varies from its design values, then the calculated filters will no longer produce the desired crosstalk cancellation. In practice, variation of A occurs whenever the listener moves his head or when different listeners use the system. This is a fundamental problem with known acoustic crosstalk cancellation systems.
Robustness to head movements is frequency-dependent, and for a given frequency, there is a specific loudspeaker spacing which gives the best performance in terms of robustness. (See D. B. Ward et al., xe2x80x9cOptimum loudspeaker spacing for robust crosstalk cancellationxe2x80x9d, Proc. IEEE Conf. Acoustic Speech Signal Processing (ICASSP-98), Seattle, May 1998, Vol. 6, pp. 3541-3544.) However, as frequency increases, the loudspeaker spacing required to give good robustness performance becomes impractical. For example, for a head distance of dH=0.5 m (typical for a desktop audio system) and a head radius of rH=0.0875 m, a loudspeaker spacing of approximately 0.1 m is required. For a more practical loudspeaker spacing of 0.25 m, the conventional crosstalk canceler is extremely non-robust at a frequency of 4 kHz, and head movements of as little as 2 cm can destroy the crosstalk cancellation effect. Thus, for a fixed loudspeaker spacing, the conventional crosstalk canceler becomes inherently non-robust at certain frequencies.
Differences between the assumed TF model and the actual TF model can be considered as perturbations of the acoustic TF matrix A of Eq. 3. These differences include movement of the head from its design position, and differences between different HRTFs. From linear systems theory, the robustness of the system of Eq. 3 to perturbation of a symmetric matrix A is reflected by its condition number, defined for A complex as                               cond          ⁢                      {            A            }                          =                                            σ              max                        ⁡                          (                                                AA                  H                                            )                                                          σ              min                        ⁡                          (                                                AA                  H                                            )                                                          (        4        )            
where min(x) and max(x) represent the smallest and largest singular values respectively. For a two-channel crosstalk canceler, A has only two singular values. When A is ill-conditioned, the crosstalk canceler will be sensitive to variations in head position. Thus, it is important to consider under which configurations the matrix A becomes ill-conditioned.
Consider the following model for the TF from the nth loudspeaker to the right ear:                                           a            n            R                    =                      ⅇ                          j2π              ⁢                              xe2x80x83                            ⁢                              fc                                  -                  1                                            ⁢                              d                n                R                                                    ,                  xe2x80x83                ⁢                  n          =          1                ,        2                            (        5        )            
where c is the speed of sound propagation, and dnR is the distance from the nth loudspeaker to the right ear (and similarly for the left ear, anL and dnL). Note that this model ignores both attenuation from the loudspeaker to the ear, and also the effect of the head on the impinging sound wavefront. Hence, it only models the inter-aural time delay. For most practicable loudspeaker spacings (where the loudspeakers are placed in front of the listener), the inter-aural time delay is almost the same whether the head is modeled as two points in space (as here), or as a sphere (See C. P. Brown et al., xe2x80x9cAn efficient HRTF model for 3-D soundxe2x80x9d, in Proc. IEEE Workshop on Applicat. of Signal Processing to Audio and Acoust. (WASPAA-97), New Paltz, N.Y., October 1997.)
Assuming that the head is symmetrically positioned between the loudspeakers and that the loudspeakers have identical flat frequency responses, the acoustic TF matrix in Eq. 3 reduces to:                     A        =                  [                                                                      a                  1                  R                                                                              a                  2                  R                                                                                                      a                  2                  R                                                                              a                  1                  R                                                              ]                                    (        6        )            
since a1L=a2R and a2L=a1R.
xe2x80x83Let d2R=d1R+. Hence,                                                                                                               a                    2                    R                                    =                                      ⅇ                                          j2π                      ⁢                                              xe2x80x83                                            ⁢                                                                        fc                                                      -                            1                                                                          ⁡                                                  (                                                                                    d                              1                              R                                                        +                            Δ                                                    )                                                                                                                                                                                          =                                                            a                      1                      R                                        ⁢                                          ⅇ                                              j2π                        ⁢                                                  xe2x80x83                                                ⁢                                                  fc                                                      -                            1                                                                          ⁢                        Δ                                                                                                                          ⁢                      
                    ⁢          Hence          ,                      xe2x80x83                    ⁢                      A            =                                          a                1                R                            ⁡                              [                                                                            1                                                                                      ⅇ                                                  j2π                          ⁢                                                      xe2x80x83                                                    ⁢                                                      fc                                                          -                              1                                                                                ⁢                          Δ                                                                                                                                                                        ⅇ                                                  j2π                          ⁢                                                      xe2x80x83                                                    ⁢                                                      fc                                                          -                              1                                                                                ⁢                          Δ                                                                                                            1                                                                      ]                                              ⁢                      xe2x80x83                    ,                      xe2x80x83                    ⁢          and                ⁢                  xe2x80x83                ⁢                  
                ⁢                  AA          H                =                  2          ⁢                                                                      "LeftBracketingBar"                                      a                    1                    R                                    "RightBracketingBar"                                2                            ⁡                              [                                                                            1                                                                                      cos                        ⁡                                                  (                                                      2                            ⁢                            π                            ⁢                                                          xe2x80x83                                                        ⁢                                                          fc                                                              -                                1                                                                                      ⁢                            Δ                                                    )                                                                                                                                                                        cos                        ⁡                                                  (                                                      2                            ⁢                            π                            ⁢                                                          xe2x80x83                                                        ⁢                                                          fc                                                              -                                1                                                                                      ⁢                            Δ                                                    )                                                                                                            1                                                                      ]                                      .                                              (        7        )            
Clearly, the matrix AAH is ill-conditioned for:
cos(2 fcxe2x88x921)=xc2x11
(in fact, it is singular), or equivalently,                               Δ          =                      p            ⁢                          c                              2                ⁢                f                                                    ,                  xe2x80x83                ⁢                  p          ∈          Z                                    (        8        )            
This result may be stated as follows: for an acoustically symmetric system, the crosstalk canceler becomes extremely non-robust when the inter-aural path difference is an integer multiple of half the operating wave-length and for frequencies where the wavelength is much larger than the speaker spacing.
If attenuation due to wave propagation or head effects is included in the model for the acoustic TFs, then although A does not become singular when the above condition holds, it is nonetheless ill-conditioned. These attenuation terms have a relatively minor effect on the robustness of the crosstalk canceler, and it is the inter-aural time delay which dominates.
Thus, for a fixed loudspeaker spacing, head distance and head radius, the crosstalk canceler will be robust only for a limited bandwidth. We will refer to the minimum frequency at which the matrix A is ill-conditioned as the critical bandwidth of the crosstalk canceler. In practice, the critical bandwidth represents the frequency at which the crosstalk canceler becomes non-robust, i.e., the frequency at which it xe2x80x9cbreaksxe2x80x9d. The crosstalk cancellation system of the present invention has a wider critical bandwidth, thereby providing good crosstalk cancellation over a wider range of frequencies.
Based on Eq. 8, FIG. 2 shows the critical bandwidth of a conventional crosstalk cancellation system as a function of loudspeaker spacing and with a default head radius of rH=0.0875 m. The results for head distances of 0.25 m, 0.5 m and 0.75 m are also shown in FIG. 2.
In view of the foregoing, there is a need for an acoustic crosstalk cancellation system which is robust to head movements.
The present invention is directed to a robust crosstalk cancellation system.
In an exemplary embodiment of a crosstalk cancellation system in accordance with the present invention, three loudspeakers are used, with a center loudspeaker displaced forward (towards the listener) relative to the two other loudspeakers, which are arranged to the left and right of the center loudspeaker. The loudspeakers are driven by a signal processing circuit which performs crosstalk cancellation at least below a predetermined frequency.
Compared to conventional crosstalk cancellation systems, the system of the present invention is less susceptible to movements of the listener""s head over a larger range of frequencies and over a larger range of head movements.