The invention relates to multidirectional audio decoding. More particularly, the invention relates to a computer-software-implemented acoustic-crossfeed canceller using very low processing resources of a personal computer for use in a multidirectional audio decoding and presentation system.
Multichannel audio for personal computer-based multimedia video games, CD ROMs, Internet audio and the like (often referred to as xe2x80x9cmultimedia audioxe2x80x9d) has emerged as a new application for the Dolby Surround and Dolby Digital multichannel sound encoding and decoding systems.
Dolby Surround, based on the use of a 4:2:4 amplitude-phase matrix, has heretofore become well known as a system for encoding four audio channels (left, right, center and surround) on two channel audio media (cassettes and compact discs), radio transmissions and the audio portions of video recordings (video tapes and laser discs), and television broadcasts, and for decoding therefrom. Dolby Surround (and Dolby Surround Pro Logic, which employs an active surround decoder to enhance channel separation) is widely used in home theatre systems, typically requiring a minimum of three loudspeakers (left and right loudspeakers positioned adjacent to the picture display and one surround loudspeaker, behind the audience) and preferably four loudspeakers (two surround loudspeakers instead of one, located at each side of the audience). Ideally, even a fifth loudspeaker is used, to provide a xe2x80x9chardxe2x80x9d center channel reproduction.
Dolby Digital employs the Dolby AC-3 digital audio coding technology in which 5.1 audio channels (left, center, right, left surround, right surround and a limited-bandwidth subwoofer channel) are encoded on a bit-rate reduced data stream. Dolby Digital, a newer technology than Dolby Surround, is already widely used in home theatre systems and has been chosen as the audio standard for the digital video disc (DVD) and high definition television (HDTV) in the United States. In a home theatre environment, Dolby Digital requires a minimum of four loudspeakers because it renders two surround channels instead of one.
In the personal computer xe2x80x9cmultimediaxe2x80x9d environment, typically only two loudspeakers are employed, left and right speakers located adjacent to or near the computer monitor (and, optionally, a subwoofer, which may be remotely located, such as on the floorxe2x80x94in the present discussion, the subwoofer is ignored). When presented over the left and right speakers via conventional means, stereo material generally produces sonic images that are constrained to the speakers themselves and the space between them. This effect results from the crossfeed of the acoustic signal from each speaker to the far ear of a listener positioned in front of the computer monitor. Acoustic cancellation and arbitrary source position rendering are aspects of the same common process.
To reproduce Dolby Surround encoded material in a computer environment, certain prior art arrangements employ multiple loudspeaker drivers within a single enclosure in order to simulate the use of multiple loudspeakers. See, for example, U.S. Pat. No. 5,553,149, which is hereby incorporated by reference in its entirety.
Other prior art arrangements have proposed the use of sound image processing employing acoustic-crossfeed cancellation to render the perception that the surround sound information is coming from virtual loudspeaker locations behind or to the side of a listener when only two forward-located loudspeakers are employed. See, for example, published European Patent Application EP 0 637 191 A2 and published International Application WO 96/96515. The origin of the acoustic-crossfeed canceller is generally attributed to B.S. Atal and Manfred Schroeder of Bell Telephone Laboratories (see, for example, U.S. Pat. No. 3,236,949, which is hereby incorporated by reference in its entirety). As originally described by Schroeder and Atal, the acoustic crossfeed effect can be mitigated by introducing an appropriate cancellation signal from the opposite speaker. Since the cancellation signal itself will crossfeed acoustically, it too must be canceled by an appropriate signal from the originally-emitting speaker, and so on.
The present invention is directed to an acoustic crossfeed canceller which may be implemented using very low processing resources of a personal computer particularly for use in a multidirectional audio decoding and presentation system such as a computer multimedia system having only two main loudspeakers.
In accordance with the present invention, an acoustic crossfeed canceller is provided, intended for implementation in software, such that when run in real time on a personal computer, the canceller has very low mips requirements and uses a small fraction of available CPU cycles. Thus, for example, the program could be included with video games, CD ROMs, Internet audio and the like, rendering surround sound images outside the space between left and right computer multimedia loudspeakers when the audio from such sources is reproduced.
In an ideal reproduction system, if a source recording has M channels, each having an associated source direction, the listener should perceive these M channels reproduced from their respective M source directions. In practical reproduction systems, the M source channels are reproduced by N presentation channels or loudspeakers, each having a position with respect to the original source directions and with respect to one or more listeners (each stationary listener having a listening position P at each ear). The overall system may be expressed as:
M[C]N[R]P
where [C] is an Mxc3x97N port filter network C which processes or maps the M source channels to the N presentation channels (i.e., linear, time-invariant mapping) and [R] is an Nxc3x97P port filter network R which processes or maps the N presentation channels to P listening positions (also linear, time-invariant mapping).
The filter network R may be represented by a room matrix R of filter responses or transfer functions (in practice, head related transfer functions or HRTFs), determined by measuring or estimating the transfer function from each of the N presentation channels to each of the P listening positions, forming an Nxc3x97P matrix of transfer functions, each of which may include the effects of loudspeaker response deviations, room acoustics, delays, echoes, possible head shadow, etc.:       R    ≡          [                                                  r              11                                                          r              12                                            …                                              r                              1                ⁢                p                                                                                        r              21                                                          r              22                                            …                                              r                              2                ⁢                p                                                                          ⋯                                ⋯                                ⋯                                ⋯                                                              r              n1                                                          r              n2                                            …                                              r              np                                          ]        ,
where the matrix elements r11 . . . rnp are individual filter responses representing the transfer function from each presentation channel to each listening position. If the matrix elements r11 . . . rnp are frequency domain transfer functions expressed, for example, as fast fourier transforms (FFTs), standard matrix operations (addition, multiplication, etc.) may be accomplished with the matrix. In accordance with the present invention, the room matrix may be simplified by ignoring all but the time delay and frequency dependent attenuation in the direct acoustic path between each presentation channel and each listening position and by smoothing the attenuation response throughout at least a substantial portion of the audio sound spectrum intended to be reproduced by said presentation channels.
The filter network C constitutes an acoustic crossfeed canceller and may be represented by a cancellation matrix C of filter responses or transfer functions:       C    ≡          [                                                  c              11                                                          c              12                                            …                                              c                              1                ⁢                n                                                                                        c              21                                                          c              22                                            …                                              c                              2                ⁢                n                                                                          ⋯                                ⋯                                ⋯                                ⋯                                                              c              m1                                                          c              m2                                            …                                              c                              m                ⁢                                  xe2x80x83                                ⁢                n                                                        ]        ,
where the matrix elements c11 . . . cmn are individual filter responses. If the matrix elements c11 . . . cmn are frequency domain transfer functions expressed, for example, as fast fourier transforms (FFTs), standard matrix operations (addition, multiplication, etc.) may be accomplished with the matrix.
Because it restores the M source channels to their original directions, the acoustic-crossfeed canceller has the ability to create phantom or virtual imagesxe2x80x94sounds apparently come from directions M rather than loudspeaker N positions, which N positions may be differently located than the M sources with respect to the listening positions P.
An acoustic crossfeed canceller functions in the nature of a xe2x80x9cspatial inversexe2x80x9d filter in a sound reproduction system to cancel a listening room""s acoustics and substitute instead the acoustics of the original recording. So that the listener hears the original M channels at the P listening positions as is desired, let
CR=I,
where I is the identity matrix, or
C=Rxe2x88x921.
Thus, the matrix C, may be determined by establishing the room matrix R and taking its inverse. Because the room matrix R is simplified, in accordance with the present invention, the resulting canceller matrix C will also be simplified, resulting in simpler software realizations of the audio crosstalk-cancelling network C, which realizations minimize the processing resource requirements when run on a personal computer.
If the elements of the R matrix are frequency-domain transfer functions, its inverse may be calculated in order to derive the cancellation matrix C. One or more software realizable Mxc3x97N port audio crosstalk-cancelling networks may then be derived from the cancellation matrix C. In the resulting Mxc3x97N port network, each output N is, depending on the realization, either (1) the linear combination of separately-filtered versions of the M inputs, (2) the linear combination of separately-filtered versions of the M inputs and separately-filtered feedback signals from the N outputs, or (3) separately-filtered feedback signals from the N outputs added to the M inputs.
One way of realizing the network is to transform the elements of the matrix C to time domain representations, from which FIR filter realizations are readily obtained, as is well known. Although an IIR filter realization is preferred in order to minimize processing resources, obtaining an IIR filter from an FIR filter is not a simple process. Thus, instead of transforming the matrix C elements to the time domain, it is preferred to leave them in the frequency domain from which their filter amplitude and phase responses are readily obtained. In turn, simple IIR or FIR/IIR filter realizations, including their filter coefficients, requiring low processing power, may be realized which implement the desired amplitude and phase responses. Although such IIR or FIR/IIR filters may be derived by trial and error techniques, in practice, a better way to realize such IIR or FIR/IIR filters is to employ one of the many off-the-shelf digital-filter-design computer programs.
If the room matrix R is not a square matrix, the canceller inverse matrix C is a xe2x80x9cpseudo matrix inversexe2x80x9d but is still the optimal way to map M source channels onto N presentation channels for presentation at P listener positions. For the underconstrained case (i.e., P is less than N), the pseudo inverse minimizes the RMS error between actual and desired solutions. For the overconstrained case (i.e., P is greater N), the pseudo inverse minimizes the RMS energy of the input(s) needed to achieve exact solution.
As will be understood from the above discussion, the principles of the present invention are applicable generally to arbitrary numbers of source channels, loudspeakers and listening positions. However, for simplicity, the preferred embodiments described below relate to the specific case in which there are two loudspeakers (such as in a typical computer multimedia arrangement, the speakers narrowly and symmetrically spaced in front of the listener, as on either side of a multimedia computer monitor or TV set), two source channels (such as, but not limited to, left surround and right surround), and two listening positions (a listener""s ears) such that N=M=P=2. Thus, the acoustic transfer room matrix R is a 2=2 matrix and the canceller""s response, C, is represented by the 2xc3x972 matrix that is the inverse of the R matrix such that the left source channel L is perceived only at the left ear (one of the two listener positions P) while the right source channel R is perceived only at the right ear (the other of the two listener positions P).
Signals applied via such an acoustic crosstalk canceller to a pair of loudspeakers adjacent to a computer monitor result in the perception that the sound is coming from the sides of the listener rather than where the speakers are locatedxe2x80x94forward direction cues are lost and the sound seems to come from the side only, where the surround speakers should be. Thus, by applying left and right channel information directly to the loudspeakers and summing that information with spatialized surround information (i.e., surround information processed by the crosstalk canceller), only two loudspeakers, located adjacent to the computer monitor, are required to render the perception of left, right and surround sound fields.
In one of its aspects, the present invention is directed to a method of deriving a cancellation matrix C of dimension Mxc3x97N in which each of the matrix elements is a frequency-domain transfer function, the matrix C representing an Mxc3x97N port audio crosstalk-cancelling network for mapping M audio source channels, each having an associated source direction, to N audio presentation channels, each having a position relative to the source directions, such that each output N is either (1) the linear combination of separately-filtered versions of the M inputs, (2) the linear combination of separately-filtered versions of the M inputs and separately-filtered feedback signals from the N outputs, or (3) separately-filtered feedback signals from the N outputs added to the M inputs. The method comprises establishing a room matrix R of dimension Nxc3x97P in which each of the matrix elements is a frequency-domain transfer function, the matrix R representing an Nxc3x97P port network for mapping N presentation channel positions to P listening positions, wherein the frequency-domain transfer functions represent the time delay and a smoothed version of the frequency dependent attenuation along a direct acoustic path from each one of said presentation channel positions to each one of said listening positions, and setting the crosstalk-cancelling matrix C equal to the inverse of the room matrix R. The smoothed version of the frequency dependent attenuation may be, for example, a smoothed average of said acoustic path attenuation throughout at least a substantial portion of the audio sound spectrum intended to be reproduced by the presentation channels.
In another of its aspects, the invention is directed to an Mxc3x97N port audio crosstalk-cancelling network for mapping M audio source channels, each having an associated source direction, to N audio presentation channels, each having a position relative to the source directions, such that each output N is either (1) the linear combination of separately-filtered versions of the M inputs, (2) the linear combination of separately-filtered versions of the M inputs and separately-filtered feedback signals from the N outputs, or (3) separately-filtered feedback signals from the N outputs added to the M inputs. The cross-talk cancelling network is produced by the steps of establishing a room matrix R of dimension Nxc3x97P in which each of the matrix elements is a frequency-domain transfer function, the matrix R representing an Nxc3x97P port network for mapping N presentation channel positions to P listening positions, wherein the frequency-domain transfer functions represent the time delay and a smoothed version of the frequency dependent attenuation along a direct acoustic path from each one of the presentation channel positions to each one of the listening positions, deriving the inverse of the room matrix R to produce a crosstalk-cancelling matrix C of dimension Mxc3x97N in which each of the matrix elements is a frequency-domain transfer function, the matrix C representing the Mxc3x97N port audio crosstalk-cancelling network, and implementing the smoothed version of the frequency dependent attenuation by one or more simple digital filters requiring low processing power. The digital filters preferably are of the IIR type or IIR/FIR type and preferably are first-order filters. The smoothed version of the frequency dependent attenuation may be, for example, a smoothed average of said acoustic path attenuation throughout at least a substantial portion of the audio sound spectrum intended to be reproduced by the presentation channels. The time delay may be realized by a digital ring buffer.
According to a further aspect of the present invention, the Mxc3x97N port audio crosstalk-cancelling network may include an amplitude compressor, the compressor comprising fixed amplitude level attenuators in each of the network""s inputs, and variable amplitude level boosters in each of the network""s outputs, the boosters each including a scaler for scaling the boost between a level which restores the input attenuation and an attenuated level which avoids clipping in the output signal. In a preferred embodiment, control for the compressor is obtained from the compressor input, the compressor has an infinite compression ratio, thereby constituting a limiter. In the preferred embodiment, the compressor further includes a delay in each of the network""s outputs and wherein the control for the compressor looks ahead in order to syllabically control the compressor""s gain. The fixed amplitude level attenuators and variable amplitude level boosters may have frequency-independent characteristics. Alternatively, the fixed amplitude level attenuators and variable amplitude level boosters have frequency dependent characteristics. When the crosstalk processor is noisy at low signal levels, as it may be when an inexpensive processor is employed, such as DSP chips supporting only 16-bit word lengths, the frequency dependent characteristics of said fixed amplitude level attenuators and variable amplitude level boosters operate only at mid to low frequencies, thus keeping the loss in signal-to-noise ratio low and limiting the loss to frequencies where it is less inaudible.
In another aspect of the invention, the audio crosstalk-cancelling network is a 2xc3x972 port network for mapping two audio source channel inputs M to two audio presentation channel output N applied to a pair of transducers having positions relative to the directions of the audio source channels M, the listener having two listening positions P, the listener""s left ear and the listener""s right ear, relative to the transducers, the network further comprising (1) two signal combiners, a first signal combiner and a second signal combiner, each signal combiner having at least two inputs and an output, wherein (a) one of the M inputs is coupled to an input of the first signal combiner and another of the M inputs is coupled to an input of the second signal combiner, and (b) one of the N outputs is coupled to the output of the first signal combiner and another of the outputs is coupled to the N output of the second signal combiner, and (2) two signal feedback paths, a first signal feedback path and a second signal feedback path, each feedback path having a time delay and frequency dependent characteristic, and each feedback path having an input and an output, wherein (a) the input of the first signal feedback path is coupled to the output of the first signal combiner and the output of the first signal feedback path is coupled to the other input of the second signal combiner, (b) the input of the second signal feedback path is coupled to the output of the second signal combiner and the output of the second signal feedback path is coupled to the other input of the first signal combiner, (c) each of the feedback paths has a time delay representing the additional time for sound to propagate along the acoustic path between a transducer and the listener""s ear farthest from the transducer with respect to the time for sound to propagate along the acoustic path between the same transducer and the listener""s ear closest to the same transducer, and (d) each of the feedback paths has a frequency dependent characteristic representing the difference in the attenuation in the acoustic path between a transducer and the listener""s ear farthest from the transducer and the attenuation in the acoustic path between the same transducer and the listener""s ear closest to the same transducer, and (3) the signal combiners, signal feedback paths, and couplings therebetween having polarity characteristics such that signals processed by a feedback path are subtractively combined with signals coupled to the other input of the respective signal combiner. The two presentation channels may be applied to a pair of transducers, arranged generally in front of and at substantially right-and-left symmetrical positions with respect to a listener. The frequency dependent characteristic may be realized as a first-order low-pass shelving characteristic, which may be implemented by an IIR filter or a combination FIR/IIR filter. The attenuation in the acoustic path between a transducer and the listener""s ear farthest from the transducer is determined by taking the difference between the head related transfer response from a transducer and the listener""s ear farthest from the transducer and the head related transfer response from the other transducer to the listener""s ear closest to the other transducer and smoothing the difference.
Various aspects of the invention may be used independently or in combination with each other.