Digital representations of analog signals are common in many storage and transmission applications. A digital representation is typically achieved by first converting an analog signal to a digital signal using an analog-to-digital (A/D) converter. Prior to transmission or storage, this raw digital signal may be encoded to achieve greater robustness and/or reduced transmission bandwidth and storage size. The analog signal is subsequently retrieved using digital-to-analog (D/A) conversion. Storage media and applications employing digital representations of analog signals include, for example, compact discs (CDs), digital video discs (DVDs), digital audio broadcast (DAB), wireless cellular transmission, and Internet broadcasts.
While digital representations are capable of providing high fidelity, low noise, and signal robustness, these features are dependent upon the available data rate. Specifically, the quality of digital audio signals depends on the data rate used for transmitting the signal and on the signal sample rate and dynamic range. For example, CDs, which are typically produced by sampling an analog sound source at 44,100 Hz, with a 16-bit resolution, require a data rate of 44,100*16 bits per second (b/s) or 705.6 kilobits per second (kb/s). Lower quality systems, such as voice-only telephony transmission can be sampled at 8,000 Hz, requiring only 8,000*8 b/s or 64 kb/s.
For most applications, the raw data bit rate of digital audio is too high for the channel capacity. In such circumstances, an efficient encoder/decoder system must be employed to reduce the required data rate, while maintaining the quality. An example of such a system is Sony Corporation's MINIDISC™ storage/playback device, which uses a 2.5 inch disc that can only hold 140 Mbytes of data. In order to hold 74 minutes of music sampled at 44,100 Hz with a resolution of 16 bits per sample (which would require 650 Mbytes of storage for the raw digital signal), an encoder/decoder system is employed to compress the digital data by a ratio of about 5:1. For this purpose, Sony employs the Adaptive Transform Acoustic Coding (ATRAC) encoder/decoder system.
Many commercial systems have been designed for reducing the raw data rate required to encode, store, decode, and playback analog signals. Examples for music include: Advanced Audio Coding (AAC), Transform-Domain Weighted Interleave Vector Quantization (TWINVQ), Dolby AC-2 and AC-3 compression schemes, Moving Pictures Experts Group (MPEG)-1 Layer 1 through Layer 3, and Sony's ATRAC and ATRAC3 systems. Examples for Internet broadcast of voice and/or music include the preceding coders and also: Algebraic Code-Excited Linear Prediction (ACELP)-Net, DolbyNET™ system, Real Network Corporation's REALAUDIO™ system, and Microsoft Corporation's WINDOWS MEDIA AUDIO™ (WMA) system.
These transform-based audio coders achieve compression by using signal representations such as lapped transforms, as discussed by H. Malvar in a paper entitled “Enhancing the Performance of Subband Audio Coders for Speech Signals” (IEEE Int. Symp. On Circuits and Sys., Monterey, Calif., June 1998) and as discussed by T. Mirya et. al. in a paper entitled, “A Design of Transform Coder for Both Speech and Audio Signals at 1 bit/sample” (IEEE ICASSP '97, Munich, pp. 1371–1374, 1997). Other transform-based coders include pseudo-quadrature mirror filters, as discussed by P. Monta and S. Cheung in a paper entitled, “Low Rate Audio Coder with Hierarchical Filter Banks and Lattice Vector Quantization” (IEEE ICASSP '94, pp. II 209–212, 1994). Typically, these representations offer the advantage that quantization effects can be mapped to areas of the signal spectrum in which they are least perceptible. However, the current technologies have several limitations. Namely, the reproduction quality is not sufficiently good, particularly for Internet applications, in which it is desirable to transmit audio sampled at 44,100 Hz at data rates less than 32 kb/s.
Some research has explored 2D energetic signal representations where the second dimension is the transform of the time variability of signal spectra (see e.g., R. Drullman, J. M. Festen, and R. Plomp, “Effect of Temporal Envelope Smearing on Speech Reception,” J. Acoust. Soc. Am. 95, pp. 1053–1064, 1994,) and Y. Tanaka and H. Kimura, “Low Bit-Rate Speech Coding using a Two-dimensional Transform of Residual Signals and Waveform Interpolation,” (IEEE ICASSP '94, Adelaide, pp. I 173–176, 1994)). This second dimension has been called the “modulation dimension” (see e.g., S. Greenberg and B. Kingsbury, “The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech,” (IEEE ICASSP '97, Munich, pp. 1647–1650, 1997)). When applied to signals such as speech or audio that are effectively stationary over relatively long periods, this second dimension projects most of the signal energy into a few low modulation frequency coefficients. Moreover, mammalian auditory physiology studies have shown that the physiological importance of modulation effects decreases with modulation frequency (see e.g., N. Kowalski, D. Depireux and S. Shamma, “Analysis of Dynamic Spectra in Ferret Primary Auditory Cortex: I. Characteristics of Single Unit Responses to Moving Ripple Spectra,” J. Neurophysiology 76, pp. 3503–3523, 1996). This past work has provided an energetic, yet not invertible transform. Instead, what is needed is a transform that produces a signal, which after modification to a lower bit rate, is invertible back to a high-fidelity analog signal.
Furthermore, for bandwidth-limited applications, the current techniques employed for audio coder-decoders (CODECs) lack scalability. It is desirable to provide modulation frequency transforms that are indeed invertible after quantization to provide essentially CD-quality music coding at 32 kb/s per channel and to provide a progressive encoding that naturally and easily scales to bit rate changes. A scalable algorithm, as defined herein, is one that can change a data rate after encoding, by applying a simple truncation of frame size, which can be achieved without further computation. Such algorithms should provide service at any variable data rate, only forfeiting fidelity for a reduction in the data rate. This capability is essential for Internet broadcast applications, where the channel bandwidth is not only constrained, but is also time dependent.