Consumer, industrial, studio and laboratory products for storing, processing and communicating high quality audio signals are in great demand. For example, so-called compact disc (“CD”) and digital audio tape (“DAT”) recordings for music have largely replaced the long-popular phonograph record and cassette tape. Likewise, recently available digital audio tape (“DAT”) recording promise to provide greater flexibility and high storage density for high quality audio signals. See, also, Tan and Vermeulen, “Digital audio tape for data storage”, IEEE Spectrum, pp. 34-38 (October 1989). A demand is also arising for broadcast applications of digital technology that offer CD-like quality.
While these emerging digital techniques are capable of producing high quality signals, such performance is often achieved only at the expense of considerable data storage capacity or transmission bandwidth. Accordingly, much work has been done in an attempt to compress high quality audio signals for storage and transmission.
Most of the prior work directed to compressing signals for transmission and storage has sought to reduce the redundancies that the source of the signals places on the signal. Thus, such techniques as ADPCM, sub-band coding and transform coding described, e.g., in N. S. Jayant and P. Noll, “Digital Codin of Waveforms,” Prentice-Hall, Inc. 1984, have sought to eliminate redundancies that otherwise would exist in the source signals.
In other approaches, the irrelevant information in source signals is sought to be eliminated using techniques based on models of the human perceptual system. Such techniques are described, e.g., in E. F. Schroeder and J. J. Platte “MSC”; Stereo Audio Coding with CD-Quality and 256 kBIT/SEC, “IEEE Trans. on Consumer Electronics, Vol. CE-33, No. 4, November 1987; and Johnston, Transform Coding of Audio Signals Using Noise Criteria, Vol. 6, No. 2, IEEE J.S.C.A. (February 1988).
Perceptual coding, as described, e.g., in the Johnston paper related to a technique for lowering required bitrates (or reapportioning available bits) or total number of bits in representing audio signals. In this form of coding, a masking threshold for unwanted signals is identified as a function of frequency of the desired signal. Then, inter alia, the coarseness of quantizing used to represent a signal component of the desired signal is selected such that the quantizing noise introduced by the coding does not rise above the noise threshold, though it may be quite near this threshold. The introduced noise is therefore masked in the perception process. While traditional signal-to-noise ratios for such perceptually coded signals may be relatively low, the quality of these signals upon decoding, as perceived by a human listener, is nevertheless high.
Brandenburg et al, U.S. Pat. No. 5,040,217, issued Aug. 13, 1991, describes a system for efficiently coding and decoding high quality audio signals using such perceptual consideration. In particular, using a measure of the “noise-like” or “tone-like” quality of the input signals, the embodiments described in the latter system provides a very efficient coding for monophonic audio signals.
It is, of course, important that the coding techniques used to compress audio signals do not themselves introduce offensive components or artifacts. This is especially important when coding stereophonic audio information where coded information corresponding to one stereo channel, when decoded for reproduction, can interfere or interact with coding information corresponding to the other stereo channel. Implementation choices for coding two stereo channels include so-called “dual-mono” coders using two independent coders operating at fixed bit rates. By contrast, “joint mono” coders use two monophonic coders but share one combined bit rate, i.e., the bit rate for the two coders is constrained to be less than or equal to a fixed rate, but trade-offs can be made between the bit rates for individual coders. “Joint stereo” coders are those that attempt to use interchannel properties for the stereo pair for realizing additional coding gain.
It has been found that the independent coding of the two channels of a stereo pair, especially at low bit-rates, can lead to a number of undesirable psychoacoustic artifacts. Among them are those related to the localization of coding noise that does not match the localization of the dynamically imaged signal. Thus the human stereophonic perception process appears to add constraints to the encoding process if such mismatched localization is to be avoided. This finding is consistent with reports on binaural masking-level differences that appear to exist, at least for low frequencies, such that noise may be isolated spatially. Such binaural masking-level differences are considered to unmask a noise component that would be masked in a monophonic system. See, for example, B.C.J. Morre, “An Introduction to the Psychology of Hearing, Second Edition,” especially chapter 5, Academic Press, Orlando, Fla., 1982.
One technique for reducing psychoacoustic artifacts in the stereophonic context employs the ISO-WG11-MPEG-Audio Psychoacoustic II [ISO] Model. In this model, a second limit of signal-to-noise ratio (“SNR”) is applied to signal-to-noise ratios inside the psychoacoustic model. However, such additional SNR constraints typically require the expenditure of additional channel capacity or (in storage applications) the use of additional storage capacity, at low frequencies, while also degrading the monophonic performance of the coding.