Technical Field
The present invention relates to low bitrate audio source coding systems. Different parametric representations of stereo properties of an input signal are introduced, and the application thereof at the decoder side is explained, ranging from pseudo-stereo to full stereo coding of spectral envelopes, the latter of which is especially suited for HFR based codecs.
Description of the Related Art
Audio source coding techniques can be divided into two classes: natural audio coding and speech coding. At medium to high bitrates, natural audio coding is commonly used for speech and music signals, and stereo transmission and reproduction is possible. In applications where only low bitrates are available, e.g. Internet streaming audio targeted at users with slow telephone modem connections, or in the emerging digital AM broadcasting systems, mono coding of the audio program material is unavoidable. However, a stereo impression is still desirable, in particular when listening with headphones, in which case a pure mono signal is perceived as originating from “within the head”, which can be an unpleasant experience.
One approach to address this problem is to synthesize a stereo signal at the decoder side from a received pure mono signal. Throughout the years, several different “pseudo-stereo” generators have been proposed. For example in [U.S. Pat. No. 5,883,962], enhancement of mono signals by means of adding delayed/phase shifted versions of a signal to the unprocessed signal, thereby creating a stereo illusion, is described. Hereby the processed signal is added to the original signal for each of the two outputs at equal levels but with opposite signs, ensuring that the enhancement signals cancel if the two channels are added later on in the signal path. In [PCT WO 98/57436] a similar system is shown, albeit without the above mono-compatibility of the enhanced signal. Prior art methods have in common that they are applied as pure post-processes. In other words, no information on the degree of stereo-width, let alone position in the stereo sound stage, is available to the decoder. Thus, the pseudo-stereo signal may or may not have a resemblance of the stereo character of the original signal. A particular situation where prior art systems fall short, is when the original signal is a pure mono signal, which often is the case for speech recordings. This mono signal is blindly converted to a synthetic stereo signal at the decoder, which in the speech case often causes annoying artifacts, and may reduce the clarity and speech intelligibility.
Other prior art systems, aiming at true stereo transmission at low bitrates, typically employ a sum and difference coding scheme. Thus, the original left (L) and right (R) signals are converted to a sum signal, S=(L+R)/2, and a difference signal, D=(L−R)/2, and subsequently encoded and transmitted. The receiver decodes the S and D signals, whereupon the original L/R-signal is recreated through the operations L=S+D, and R=S−D. The advantage of this, is that very often a redundancy between L and R is at hand, whereby the information in D to be encoded is less, requiring fewer bits, than in S. Clearly, the extreme case is a pure mono signal, i.e. L and R are identical. A traditional L/R-codec encodes this mono signal twice, whereas a S/D codec detects this redundancy, and the D signal does (ideally) not require any bits at all. Another extreme is represented by the situation where R=−L, corresponding to “out of phase” signals. Now, the S signal is zero, whereas the D signal computes to L. Again, the S/D-scheme has a clear advantage to standard L/R-coding. However, consider the situation where e.g. R=0 during a passage, which was not uncommon in the early days of stereo recordings. Both S and D equal L/2, and the S/D-scheme does not offer any advantage. On the contrary, L/R-coding handles this very well: The R signal does not require any bits. For this reason, prior art codecs employ adaptive switching between those two coding schemes, depending on what method that is most beneficial to use at a given moment. The above examples are merely theoretical (except for the dual mono case, which is common in speech only programs). Thus, real world stereo program material contains significant amounts of stereo information, and even if the above switching is implemented, the resulting bitrate is often still too high for many applications. Furthermore, as can be seen from the resynthesis relations above, very coarse quantization of the D signal in an attempt to further reduce the bitrate is not feasible, since the quantization errors translate to non-neglectable level errors in the L and R signals.