The present invention relates to the processing of information signals and, more particularly, to techniques for efficiently encoding audio signals, including signals representative of voice and music.
A significant amount of effort has been directed in recent years to so-called perceptual audio coding, or PAC. In accordance with this technique, each of a succession of time domain blocks of an audio signal is coded in the frequency domain. Specifically, the frequency domain representation of each block is divided into coder bands, each of which is individually coded, based on psycho-acoustic criteria, in such a way that the audio signal is significantly "compressed," meaning that the number of bits required to represent the audio signal is significantly less than would be the case if the audio signal were represented in a more simplistic digital format, such as in the form of PCM words.
When the audio signal comprises two or more input channels, such as the left and right channels of stereophonic (stereo) music, the above-described perceptual coding is carried out on a like number of so-called matrixed channels. In the most straightforward implementation, each matrixed channel is directly derived from a respective input channel. Thus in the stereo music case, for example, this would mean that the perceptual coding codes the frequency domain representation of the left stereo input channel over time, denoted herein as "L", and, separately, the frequency domain representation of the right stereo input channel over time, denoted herein as "R". However, further compression can be achieved when the input channels are highly correlated with one another--as, indeed, is almost always the case with stereo music channels--by switching the coding carried out for each coder band between two coding modes in which different sets of matrixed channels are used. In one of the modes, the set of two matrixed channels simply comprises the input channels L and R. In the other mode, the set of two matrixed channels comprises S=(L+R)/2 and D=(L-R)/2. The S and D channels are referred to as sum/difference channels. This technique is taught in U.S. patent application Ser. No. 07/844,804 entitled "Method and Apparatus for Coding Audio Signals Based on a Perceptual Model" filed Mar. 2, 1992, allowed Aug. 11, 1993, now U.S. Pat. No. 5,285,498 issued Feb. 8, 1994 to J. D. Johnston, hereinafter referred to as "the Johnston patent", and hereby incorporated by reference.
More recently, the art has turned its attention to the perceptual coding of more-than-two-channel audio, such as five-channel audio. (As will apparent to those skilled in the art as this description continues, the invention can, however, be implemented in a system having other than five channels.) The input channels of a five-channel audio system typically comprise three "front" channels and two "back" channels. The front channels include the conventional left and right stereo channels plus a center channel whose frequency domain representation over time is denoted herein as C. These channels are intended to be reproduced at speakers positioned in front of the listener-at the left, at the right and directly in front, respectively. The back channels are referred to as the "left surround" and "right surround" channels whose frequency domain representations over time are denoted herein as LS and RS, These channels are intended to be reproduced at speakers positioned behind the listener--at the left and at the right, respectively.