1. Field of the Invention
The present invention relates to coding and decoding of audio signals to reduce transmission bandwidth without unacceptably degrading the quality of the reconstructed signal.
2. Description of Related Art
Many techniques exist in the field of audio compression for encoding a signal that can later be decoded without significant loss of quality. A common scheme is to sample a signal and use these samples to produce a discrete frequency transform. Varieties of transforms exist such as Discrete Fourier Transform (DFT), Odd-frequency Discrete Fourier Transform (ODFT), and Modified Discrete Cosine Transform (MDCT).
Also, transmission bandwidth can be conserved by sending only lower frequency (base band) spectral components. To restore the higher frequency components on the decoding side, various bandwidth extension techniques have been proposed. A simple technique is to take the base band components and scale them up in frequency.
Also, certain frequency components are difficult to perceive by the human ear when they are close in frequency to a dominant, high energy component. Accordingly, such dominant components can have associated with them a masking function to attenuate nearby frequency components, the attenuation being greater the closer a component is to the dominant masking component. Techniques of this type are part of the field of perceptual coding.
The field of perceptual coding for audio coding has been an active one over the past two decades. Typical configuration for the perceptual model used in audio codecs such as PAC, AAC, MPEG-LayerIII etc. may be found in [1-5].    1. J. D. Johnston, D. Sinha, S. Dorward, and S. R. Quackenbush, “AT&T Perceptual Audio Coding (PAC),” in AES Collected Papers on Digital Audio Bit-Rate Reduction, N. Gilchrist and C. Grewin, Eds. 1996, pp. 73-82.    2. Kyoya Tsutui, Hiroshi Suzuki, Mito Sonohara Osamu Shimyoshi, Kenzo Akagiri, and Robert M. Heddle, “ATRAC: Adaptive Transform Acoustic Coding for MiniDisc,” 93rd Convention of the Audio Engineering Society, October 1992, Preprint n. 3456.    3. K. Bradenburg, G. Stoll, et al. “The ISO-MPEGAudio Codec: A Generic-Standard for Coding of High Quality Digital Audio,” in 92nd AES Convention, 1992, Preprint no. 3336.    4. Marina Bosi et al., “ISO/IEC MPEG-2 Advanced Audio Coding,” 101st Convention of the Audio Engineering Society, November 1996, Preprint no. 4382.    5. Mark Davis, “The AC-3 Multichannel Coder,” 95th Convention of the Audio Engineering Society, October 1993, Preprint n. 3774.The centerpiece of perceptual modeling is the concept of auditory masking [11-15, 27].    11. Joseph L. Hall, “Auditory Psychophysics for Coding Applications,” Section IX, Chapter 39, The Digital Signal Processing Handbook, CRC Press, Editors: Vijay K. Madisetti and Douglas B. Williams, 1998.    12. B. C. J. Moore, An Introduction to the Psychology of Hearing, 5th Ed., Academic Press, San Diego (2003).    13. Eberhard Zwicker, and Hugo Fastl, Psychoacoustics: Facts and Models, Springer Series in Information Sciences (Paperback), Second updated edition.    14. Anibal J. S. Ferreira, Spectral Coding and Post-Processing of High Quality Audio, Ph.D. thesis, Faculdade de Engenharia da Universidade do Porto-Portugal, 1998, http://telecom.inescn.pt/doc/phd en.html.    15. D. Sinha, Low bit rate transparent audio compression using adapted wavelets. Ph.D. thesis, University of Minnesota, 1993.    27. Nikil Jayant, James Johnston, and Robert Safranek, “Signal Compression Based on Models of Human Perception,” Proceedings of the IEEE, vol. 81, no. 10, pp. 1385-1422, October 1993.The goal is to quantize the audio signal in such a way that the quantization noise is either fully masked or rendered less annoying due to masking by the audio signal. Building of a perception model in audio codec typically involves the utilization of following four key concepts: simultaneous masking, temporal masking, frequency spread of masking, and, tone vs. noise like nature of the masker. Simultaneous masking is a phenomenon whereby a masker is found to mask the perception of a maskee occurring at the same time. Temporal masking refers to a phenomenon in which a masker masks a maskee occuring either prior to or after its occurrence. Frequency spread of masking refers to the phenomenon that a masker at a certain frequency has a masking potential not only at that frequency but also at neighboring frequencies. Finally, the masking potential of a narrow band masker is strongly dependent on the tone vs. noise like nature of the masker. These factors are utilized to estimate desired quantization accuracy, or Signal to Mask Ratio (SMR) for each band of frequency.
In many audio codecs the masking model for wideband audio signals is constructed using a two step procedure. First the (short-term) signal spectrum is analyzed in multiple partitions (which are narrower than a critical band). The masking potential of each narrow-band masker is estimated by convolving it with a spreading function which models the frequency spread of masking. The masked threshold of the wide band audio signal is then estimated by considering it to be the superposition of multiple narrow band maskers. Recent studies suggest that this assumption of superposition may not always be a valid one. In particular a phenomenon called Comodulation Release of Masking (CMR) has implication towards the extension of narrow band model to a wide band model. B. C. J. Moore, An Introduction to the Psychology of Hearing, 5th Ed., Academic Press, San Diego (2003). See Hall J W, Grose J H, Mendoza L (1995) Across-channel processes in masking. In: Hearing (Moore B C J, ed), pp 243-266. San Diego: Academic.