Consumer, industrial, studio and laboratory products for storing, processing and communicating high quality audio signals are in great demand. For example, so-called compact disc (CD) digital recordings for music have largely replaced the long-popular phonograph records. More recently, digital audio tape (DAT) devices promise further enhancements and convenience in high quality audio applications. See, for example, Tan and Vermeulen, "Digital audio tape for data storage," IEEE Spectrum, October 1989, pp. 34-38. Recent interest in high-definition television (HDTV) has also spurred consideration of how high quality audio for such systems can be efficiently provided.
While commercially available CD and DAT systems employ elaborate parity and error correction codes, no standard presently exists for efficiently coding source information for high quality audio signals with these devices. Tan and Vermeulen, supra, note that (unspecified) data compression, among other techniques, can be used to increase capacity and transfer rate for DAT devices by a factor of ten over time.
It has long been known that the human auditory response can be masked by audio-frequency noise or by other-than-desired audio frequency sound signals. See, B. Scharf, "Critical Bands," Chap. 5 in J. V. Tobias, Foundations of Modern Auditory Theory, Academic Press, New York, 1970. While "critical bands," as noted by Scharf, relate to many analytical and empirical phenonomena and techniques, a central features of critical band analysis relates to the characteristic of certain human auditory responses to be relatively constant over a range of frequencies. Thus, for example, the loudness of a band of noise at a constant sound pressure remains constant as the bandwidth increases up to the critical band; then loudness begins to increase. In the cited Tobias reference, at page 162, there is presented one possible table of 24 critical bands, each having an identified upper and lower cutoff frequency. The totality of the band covers the audio frequency spectrum up to 15.5 kHz. These effects have been used to advantage in designing coders for audio signals. See, for example, M. R. Schroeder et al, "Optimizing Digital Speech Coders By Exploiting Masking Properties of the Human Ear," Journal of the Acoustical Society of America, Vol. 66, pp. 1647-1652, December, 1979.
E. F. Schroeder and H. J. Platte, "MSC': Stereo Audio Coding with CD-Quality and 256 IT/SEC," IEEE Trans. on Consumer Electronics, Vol. CE-33, No. 4, November 1987, describes a perceptual encoding procedure with possible application to CDs.
In J. D. Johnston, "Transform Coding of Audio Signals Using Perceptual Noise Criteria," IEEE Trans. on Selected Areas in Communications, February 1988, pp. 314-434 and .[.copending application Ser. No. 292,598, filed Dec. 30, 1988;.]. .Iadd.U.S. Pat. No. 5,535,300, issued Jul. 9, 1996 on application Ser. No. 284,324, filed Aug. 2, 1994, which is a continuation of Ser. No. 109,867, Aug. 20, 1993, U.S. Pat. No. 5,341,457, which is a continuation of Ser. No. 962,151, Oct. 16, 1992, abandoned, which is a continuation of Ser. No. 844,967, Feb. 28, 1992, abandoned, which is a continuation of Ser. No. 292,598, Dec. 30, 1998, abandoned, .Iaddend.by J. L. Hall II and J. D. Johnston, assigned to the assignee of the present invention, there are disclosed enhanced perceptual coding techniques for audio signals. Perceptual coding, as described in the Johnston, et al paper relates to a technique for lowering required bitrates (or reapportioning available bits) in representing audio signals. In this form of coding, the masking threshold for unwanted signals is identified as a function of frequency of the desired signal. Then the coarseness of quantizing used to represent a signal component of the desired signal is selected such that the quantizing noise introduced by the coding does not rise above the noise threshold, though it may be quite near this threshold. While traditional signal-to-noise ratios for such perceptually coded signals may be relatively low, the quality of these signals upon decoding, as perceived by a human listener, is nevertheless high. In particular, the systems described in this paper and copending application use a human auditory model to derive a short-term spectral masking function that is implemented in a transform coder. Bitrates are reduced by extracting redundancy based on signal frequency analysis and the masking function. The techniques use a so-called "tonality" measure indicative of the shape of the spectrum over the critical bands of the signal to be coded to better control the effects of quantizing noise. As noted in the Johnston paper, supra, and the cited patent application Ser. No. 292,598, the masking effect of noise is dependent on the "tonelike or noiselike" nature of the signal. In particular, an offset for the masking threshold for each critical band is developed which depends on whether a "coefficient of tonality" for the signal in each critical band indicates that the signal is relatively more tonelike or noiselike. This coefficient of tonality is, in turn, conveniently derived from a measure of flatness of the spectrum of the signal over that critical band.