The present invention relates generally to digital audio systems and more particularly to a data compression system for substantially increasing the playing time of a given storage medium without significant degradation of sound quality.
In applications where access to a large library of digital audio is desired, the main problem is in the extraordinary data volume required to store high quality music. To place this problem in perspective, the standard compact disc player transfers digital audio data at a rate of approximately 5.3 megabytes per minute for one stereo channel. If one were to store all compact disc data for a 3 minute stereo selection on a hard disk of a computer, the selection would occupy 31.8 megabytes, or more than a 30 megabyte hard drive can hold. Even using a large 750 megabyte disk drive only about 1 hour and 10 minutes of music could be stored. That is far too little for an evening's entertainment or for digital jukebox purposes.
While there have been some advances made in the field of data compression, present-day data compression techniques have not adequately focused on human auditory perception. For example, many data compression algorithms are intended for compressing telemetry and telecommunications data and speech. There has heretofore been no practical data compression technique for reducing the data volume to allow meaningful playing times on relatively simple devices while preserving audio quality of recorded music.
The present invention uses a combination of source coding theory and theory of human auditory perception to greatly reduce the storage requirements for digital audio. In the presently preferred embodiment the 5.3 megabyte per minute compact disc data rate has been reduced to 0.42 megabytes per minute per channel. This reduction in data rate is achieved by an encoding and decoding system in which the more costly components are used on the encoding side, to allow simple and inexpensive equipment to be used on the decoding side. More specifically, the system is designed to permit decoding with a processor with limited arithmetic precision, for example, 16 bit fixed-point arithmetic. The present invention is thus well suited for music distribution systems, video game systems, consumer audio systems, digital jukeboxes and computer-controlled video/audio systems.
In accordance with one principle of the invention, a wideband digital audio signal is processed by transforming it into the frequency domain comprising data capable of being represented as complex numbers. A magnitude portion and a phase portion are extracted from the frequency domain data, with different quantization processes being performed on the magnitude and phase portions. After the quantization processes, the magnitude and phase data are stored as digital data on a data storage medium. In the presently preferred embodiment the magnitude portion is quantized using a vector quantization technique while the phase portion is quantized using uniform scalar quantization. By treating the magnitude and phase separately, the invention permits different quantization rules to be applied to each. This allows the use of vector quantization of the magnitudes and scalar quantization of the phases.
In accordance with another principle of this invention, expansion via scaling of bands of magnitude coefficients to a common power level assures that the noise produced by their quantization will be essentially inaudible. Using this technique it is possible to achieve effects similar to a more complex process of dynamically choosing the rate of the quantizer (in bits per coefficient) on the basis of perceptual masking calculations.
In accordance with another principle of the invention, different vector quantizers are designed for different bands of magnitude coefficients. This use of a plurality of vector quantizers insures better performance, because the quantizer is matched to the band being encoded, e.g., more highly correlated vectors in the low frequency bands than in the high frequency bands. Moreover, the vector quantizers may have different rates (in bits per magnitude coefficient) reflecting the fact that the human auditory system is more sensitive to errors in some frequency bands than others.
In implementing the presently preferred embodiment, a vector quantization codebook is developed uniquely for the magnitudes of each segment (of approximately 3 minutes in length) of the audio selection being recorded. The unique codebook further includes portions which are unique to each of the frequency bands. On playback, the codebook is first transmitted to and loaded in the decoding equipment, whereupon the magnitude portions of the encoded digital audio may be quickly and efficiently decoded to restore the original magnitude portions. In accordance with another principle of this invention, the codebooks are two-stage and tree-structured so that excellent quantization characteristics are obtained with greatly reduced complexity.
Neural conduction time in the human auditory system is somewhat indeterminate and therefore the phase of higher frequencies is of less importance than the phase of lower frequencies. This means that while the phase at low frequencies must be quantized with a large number of bits, the phase at higher frequencies may be quantized with substantially fewer bits. The presently preferred embodiment uses a detailed understanding of human auditory perception to allocate the minimum number of bits to the quantization of each phase, with higher frequencies receiving less or even zero bits. Moreover, it uses pseudorandom phase dither to eliminate the audible effects of correlations in the quantized phase errors.
Transform coding, such as described in this invention, is susceptible to "pre-echo," which may be heard when intervals of silence are followed by a transient such as a drumbeat, unless corrective measures are taken. In accordance with a principle of this invention, pre-echo is greatly reduced or eliminated by dividing blocks into subblocks, detecting the occurrence of transients that would cause pre-echoes and individually expanding via scaling the subblocks of those blocks containing transients, in a manner that exploits temporal masking in human auditory perception. In accordance with another principle of this invention, pre-echo is reduced or eliminated by dynamically augmenting the bit allocation to the phase quantizers in blocks containing transients.
The entire system is designed with low cost decoding in mind. Specifically, several steps of the encoding process are tailored to minimize the potential for truncation errors when a low cost, limited precision fixed-point arithmetic is used in the decoder. One such principle is the expansion via scaling of each block before transforming.
For a more complete understanding of the invention, its objects and advantages, reference may be had to the following specification and to the accompanying drawings.