1. Field of the Invention
The invention relates to methods and systems for generating a matrix-encoded two-channel audio signal, in response to a horizontal B-format signal, or in response to the output signals of a microphone array.
2. Background of the Invention
Throughout this disclosure, including in the claims, the term “render” denotes the process of converting an audio signal (e.g., a multi-channel audio signal) into one or more speaker feeds (where each speaker feed is an audio signal to be applied directly to a loudspeaker or to an amplifier and loudspeaker in series), or the process of converting an audio signal into one or more speaker feeds and converting the speaker feed(s) to sound using one or more loudspeakers. In the latter case, the rendering is sometimes referred to herein as rendering “by” the loudspeaker(s).
Throughout this disclosure, including in the claims, the terms “speaker” and “loudspeaker” are used synonymously to denote any sound-emitting transducer. This definition includes loudspeakers implemented as multiple transducers (e.g., woofer and tweeter).
Throughout this disclosure, including in the claims, the expression performing an operation “on” signals or data (e.g., filtering, scaling, or transforming the signals or data) is used in a broad sense to denote performing the operation directly on the signals or data, or on processed versions of the signals or data (e.g., on versions of the signals that have undergone preliminary filtering prior to performance of the operation thereon).
Throughout this disclosure including in the claims, the expression “system” is used in a broad sense to denote a device, system, or subsystem. For example, a subsystem that implements an encoder may be referred to as an encoder system (or an encoder), and a system including such a subsystem (e.g., a system that generates X output signals in response to multiple inputs, in which the subsystem generates M of the inputs and the other X-M inputs are received from an external source) may also be referred to as an encoder system (or an encoder).
Throughout this disclosure including in the claims, the verb “includes” is used in a broad sense to denote “is or includes,” and other forms of the verb “include” are used in the same broad sense. For example, the expression “a filter which includes a feedback filter” (or the expression “a filter including a feedback filter”) herein denotes either a filter which is a feedback filter (i.e., does not include a feedforward filter), or filter which includes a feedback filter (and at least one other filter).
A matrix-encoded two-channel audio signal can be rendered (typically, including by performing a decoding operation thereon) by a speaker array to produce a multi-channel sound field. For example, one type of matrix-encoded two-channel audio signal can be decoded to determine N (where N is greater than two) audio channels for rendering by a speaker array (e.g., an array of N speakers).
Matrix encoding is a method for mixing one or more (e.g., two, three, four, or five) source audio signals into a pair of encoded audio signals, such that each source signal is mixed into the encoded signals according to directional encoding rules. The directional encoding rules operate on the assumption that there is a source azimuth angle θ associated with each source audio signal, where θ is defined as in FIG. 1. Specifically, the source shown in FIG. 1 is the source of an audio signal having the time-varying audio waveform “SourceSig” which is received by a microphone array (e.g., a single microphone) or listener at the origin of the indicated X-Y coordinate system. In FIG. 1, positive values along the X-axis correspond to positions in front of the listener (or microphone array), and azimuth θ is measured anticlockwise from the X-axis.
The directional rules that must be satisfied to generate a matrix-encoded two-channel audio signal can be expressed in terms of a simple set of instructions as follows:
1. The matrix-encoded audio signals are referred to as left channel signal Lt and right channel signal Rt (a matrix-encoded pair of audio signals). To generate a matrix-encoded audio signal indicative of a source audio signal having the time-varying audio waveform, SourceSig, and source azimuth, θ, the source audio signal should be mixed into the Lt and Rt signals with a pair of encoder gains (GLt, and GRt, which are functions of θ), such that:Lt=GLt(θ)×SourceSig,  (1)Rt=GRt(θ)×SourceSig, and  (2)|GLt|2+|GRt|2=1.  (3)Equation (3) is sometimes referred to as the constant power rule. Note that, in keeping with common nomenclature, the gains (GLt and GRt) may be complex valued, where the argument of the complex gain corresponds to a phase-shift in the mixing operation;
2. Any source audio signal that has a source azimuth of 0° (θ=0), corresponding to the centre-front channel of a multi-channel audio stream, for example, should be encoded into the Lt and Rt signals with encoder gains satisfying GLt=GRt;
3. Any source audio signal that has a source azimuth of 90° (θ=π/2), corresponding to the left channel of a multi-channel audio stream, for example, should be encoded into the Lt and Rt signals with encoder gains satisfying |GLt|=1 and GRt=0;
4. Any source audio signal that has a source azimuth of −90° (θ=−π/2), corresponding to the right channel of a multi-channel audio stream, for example, should be encoded into the Lt and Rt signals with encoder gains satisfying GLt=0 and |GRt|=1; and
5. Any source audio signal that has a source azimuth of 180° (θ=π), corresponding to the centre-rear channel of a multi-channel audio stream, for example, should be encoded into the Lt and Rt signals with encoder gains satisfying GLt=GRt.
It can be shown that the above rules can be satisfied by using gain values (each a function of source azimuth θ) defined as follows:GLt=ejΦ(θ)×cos(θ/2−π/4), and  (4)GRt=ejΦ(θ)×cos(θ/2+π/4),  (5)where Φ(θ) is an arbitrary real valued function defined over the interval −π<θ≦π.
The function Φ(θ) effectively applies an azimuth-dependent phase shift to the Lt and Rt signals equally. Note that a Matrix Decoder operates by examining the relative amplitude and phase of the Lt and Rt signals, but has no way of detecting a bulk phase shift that has been applied equally to both Lt and Rt. Hence, the general case for matrix-encoded signals includes this Φ(θ) term.
Another audio signal format is the horizontal B-format. Similar to the way that matrix-encoded signals may be defined in terms of azimuth-dependent gain functions GLt(θ) and GRt(θ) (and a source signal waveform, SourceSig), a horizontal B-format signal (indicative of a source audio signal having waveform, SourceSig, and azimuth θ) is defined herein as being composed of three audio signals, W, X and Y, as follows:W=SourceSig,  (6)X=cos θ×SourceSig,  (7)Y=sin θ×SourceSig.  (8)Some authors define the W signal with a reduced amplitude, as
      W    =                  1                  2                    ×      SourceSig        ,but that definition is not used herein. It will be apparent to those of ordinary skill that the present invention applies to B-format signals with alternative scaling of their audio signal components, without loss of generality.
A variety of methods are known for recording an acoustic performance (or other acoustic event) in the form of a B-format signal.
Gerzon proposed (in M. A. Gerzon, “Ambisonics in Multichannel Broadcasting and Video,” Preprint 2034 of the 74th Audio Engineering Society Convention, New York, October 1983) a method for mixing the W, X, and Y channels of a horizontal B-format signal into two channels (i.e., a UHJ format stereo signal; not a matrix-encoded stereo signal) to enable more convenient handling in a transmission and playback environment. The UHJ format stereo signal comprised two signals (Σ and Δ) which could be converted to UHJ format L and R stereo channels as follows:
      Σ    =                  0.9397        ×        W            +              0.2624        ×        X                  Δ    =                            j          ×                      (                                                            -                  0.3420                                ×                W                            +                              0.7211                ×                X                                      )                          +                  0.9269          ×          Y          ⁢                                          ⁢          L                    =                                                  Σ              +              Δ                        2                    ⁢                                          ⁢          R                =                                            Σ              -              Δ                        2                    .                    Note that the above UHJ encoding equations for Σ, Δ, L, and R are based on the assumption that the W, X, and Y signals are scaled according to above equations (6), (7), and (8); not with application of a
  1      2  scaling factor to W].
The UHJ encoding equations set forth above may be written in matrix form as:
                              [                                                    L                                                                    R                                              ]                =                              [                                                                                0.4698                    -                                          0.1710                      ⁢                      j                                                                                                            0.1213                    +                                          0.3605                      ⁢                      j                                                                                        0.4634                                                                                                  0.4698                    +                                          0.1710                      ⁢                      j                                                                                                            0.1213                    -                                          0.3605                      ⁢                      j                                                                                                            -                    0.4634                                                                        ]                    ×                      [                                                            W                                                                              X                                                                              Y                                                      ]                                              (        9        )            Gerzon's method for mixing the three channels of a horizontal B-format signal into a stereo pair is intended to provide a reasonable stereo listening experience, as well as to provide some ability to regenerate an approximate version of the original W, X, and Y signals from the UHJ format L and R stereo signals. However, the stereo UHJ format has significant disadvantages:
UHJ encoding (per equation (9) does not encode an original source signal (with azimuth θ) with power independent of θ. Rather, the power of the UHJ format L and R signal pair (or the corresponding Σ and Δ signal pair) depends on the azimuth θ of the source signal. Sounds from the front will be encoded (by equation (9)) with greater amplitude than sounds from the rear. Indeed, it was the design intention of UHJ encoding to give greater prominence to frontal signals; and
an original source signal with azimuth equal to zero (i.e., a front-center source signal) is encoded into the UHJ format L and R channels with a phase shift between the channels (i.e., the UHJ format L and R channels generated in response to a front-center source each have form kW+j(mW), where k and m are nonzero coefficients). This means that a clear phantom-center image will not be formed by the stereo UHJ signal.
Typical embodiments of the present invention generate a matrix-encoded two-channel (stereo) signal in response to in response to a horizontal B-format signal (or in response to the output signals of a microphone array). These matrix-encoded stereo signals are useful for many purposes. For example, matrix-encoded two-channel signals generated by typical embodiments of the invention are useful as input to decoders which implement Dolby ProLogic II decoding. Such decoders are in widespread use throughout the world.
Also, until the present invention, it had not been known how to use the outputs of microphone arrays (e.g., simple arrangements of simple microphones, such as for example, cardiod microphones with 1st-order directivity patterns) to generate matrix-encoded signals via a simple linear mixing process. Matrix-encoded two-channel signals are generated by some embodiments of the invention by capturing an acoustic event with any of a variety of commonly available microphone arrangements (e.g., B-format microphones) and encoding the resulting microphone outputs into a matrix-encoded signal pair.