This invention is relates to the encoding of an audio signal, such as music, into a compressed data stream, the distribution of this compressed data stream on physical or electronic media, and the decoding of this compressed data stream into a psychoacoustically acceptable representation of the originally encoded audio signal. More specifically, it relates to unique data stream compositions, structures and formats which allow for the alteration of the data rate associated with an encoded compressed data stream without first decoding the data stream back to its uncompressed form and then recoding the resulting uncompressed data at a different data rate. It also relates to the methods and apparatus used to perform this data rate alteration.
The entertainment industry has spent many millions of dollars to capitalize on the opportunities created by the availability of digitally compressed music and video programs. Using high quality compression technology, audio and video content can now be distributed over widely deployed networks, such as the Internet, directly to consumers. This gives artists, record labels, movie studios, and the owners of the content, the ability to initiate and maintain direct contact with their customers and thus be in the position to gather market information of unprecedented accuracy while very effectively promoting their entertainment products. In addition, with the audio and video program material being provided to consumers in the form of compressed digital bit streams over the Internet, the cost of CD and DVD replication, as well as the cost of delivering physical media through retail outlets, are no longer in the equation. Thus, it can be readily seen that the entertainment industry has a strong interest in making the profitable electronic delivery of compressed music and video content an everyday reality.
The main objective of an audio compression algorithm is to create a sonically acceptable representation of an input audio signal using as few digital bits as possible. This permits a low data rate version of the input audio signal to be delivered over limited bandwidth transmission channels, such as the Internet, and reduces the amount of storage necessary to store the input audio signal for future playback. The level of artifacts introduced by a particular audio compression/decompression process into the recovered decompressed signal, and thus the quality of the decompressed audio signal, is, for the most part, inversely proportional to the number of bits used to encode the audio signal. The lower the number of bits used the more noticeable the difference between the recovered decompressed audio and the original audio signal. For those applications in which the data capacity of the transmission channel is fixed, and non-varying over time, or the amount, in terms of minutes, of audio that needs to be stored is known in advance and does not increase, traditional audio compression methods, such as those described in the following book can be effectively used: Pohlmann, Ken C., “Principles of Digital Audio” Fourth Edition, McGraw-Hill (2000), particularly chapters 10 and 12 (primarily pp. 430-436). These chapters are incorporated herein in their entirety by this reference.
In these forms of prior art, the data rate at which an audio signal is compressed, and thus its level of audio quality, is determined at the time of compression encoding. No further reduction in data rate can be effected without either recoding the original signal at a lower data rate or decompressing the compressed audio signal and then recompressing this decompressed signal at a lower data rate. If this fixed rated compressed audio signal is delivered over a reliable transport channel, that does not vary in its data carrying capacity over time, the needs of the consumer, to which this audio data is delivered, will be satisfied. However, if the carrying capacity of the transport channel diminishes, as would be the case during the occurrence of an Internet net blockage, or if more subscribers connect to the channel, utilizing more capacity than the channel has to offer, there is nothing that can be done to maintain the quality of service to any particular consumer. Under these circumstances, the consumer will be subjected to varying length periods of service interruption. This is a fundamental limitation of audio compression schemes in common use today.
Another situation in which the compression processes described in the Pohlmann book can cause consumer dissatisfaction occurs in the case in which a consumer has a fixed amount of memory available to store musical content which is desired to be reproduced by a portable music player. Many of the handheld portable audio appliances available today are based on storage mechanisms such as Flash ROM, with storage capacities as low as 32 megabytes. If the consumer has available to him or her audio compressed at a fixed rate of 128 kilobits per second, the maximum length of the combined musical selections that will be able to be stored on this 32 megabyte storage module will be about 33 minutes. If the consumer wishes to store more music on this storage module the only choice that would be available to the consumer would be to tediously re-encode the desired musical selections at a lower data rate.
Yet another limitation of this prior art is its inability to easily “scale” a single audio bit stream when used in different applications, each of which require audio compressed at a different data rate. Currently, a high quality, high data rate, compressed audio stream is converted into one of a lower data rate representation, of lower quality, by first decoding the data stream back to its uncompressed form and then recoding the resulting uncompressed data at a different data rate. This compression/decompression process is not only tedious it also causes additional losses in audio quality, whether or not the subsequent encoding process is at a different data rate as compared to the previous encoding process. The loss in quality associated with recoding once compressed audio is well known. The AES41-2000 Audio Engineering Society Standard, which defines a process that can be followed to reduce this loss in quality, entitled “AES Standard For Digital Audio—Recoding Data Set For Audio Bit Rate Reduction,” appears in the Journal of the Audio Engineering Society, Volume 48, Number 6, June 2000, pages 565 through 583.
One general prior art technique used to create a bit stream with scalable characteristics, and circumvent the limitations previously described, employs an encoder/decoder or codec which encodes the input audio signal as a high bit rate data stream composed of subsets of low bit rate data streams. In this approach, low bit rate streams are used to construct the higher bit rate streams. These encoded low bit rate data streams can be extracted from the coded signal and combined to provide an output data stream whose bit rate is adjustable over a wide range of bit rates. One approach to implement this concept is to first encode data at a lowest supported bit rate, then encode an error between the original signal and a decoded version of this lowest bit rate bit stream. This encoded error is stored and also combined with the lowest supported bit rate bit stream to create a second to lowest bit rate bit stream. Error between the original signal and a decoded version of this second to lowest bit rate signal is encoded, stored and added to the second to lowest bit rate bit stream to form a third to lowest bit rate bit stream an so on. This process is repeated until the sum of the bit rates associated with bit streams of each of the error signals so derived and the bit rate of the lowest supported bit rate bit stream is equal to the highest bit rate bit stream to be supported. The final scalable high bit rate bit stream is composed of the lowest bit rate bit stream and each of the encoded error bit streams. Note that for this scheme, called difference coding, to be viable, the error signal must be compressed to a substantially lower bit rate than the original. Also note that the increment of audio improvement associated with each of the encoded error “helper signals” included in the bit stream will be directly proportional to the compressed data rate of each helper signal (the higher the data rate of the helper signal the larger the increment of audio improvement) and the scaling resolution will be inversely proportional to the compressed data rate of each helper signal (the higher the data rate of the helper signal the courser the scaling resolution).
A second general technique, usually used to support a small number of different bit rates between widely spaced lowest and highest bit rates, employs the use of more than one compression algorithm to create a “layered” scalable bit stream. In this approach, a hybrid of compression algorithms is used to cover the desired range of scalable bit rates. The apparatus that performs the scaling operation on a bit stream coded in this manner chooses, depending on output data rate requirements, which one of the multiple bit streams carried in the hybrid bit stream to use as the coded audio output. To improve coding efficiency and provide for a wider range of scaled data rates, data carried in the lower bit rate bit streams can be used by higher bit rate bit streams to form additional higher quality, higher bit rate bit streams.
The first scalable bit stream approach described above is computationally intensive. Since extensive analysis of the bit stream being scaled is required, significant processing power is needed to attain real time performance. This is especially true if this approach is configured to permit fine grained scaling of the bit stream's data rate. With real time operation being a necessity for many applications which benefit from the use of bit stream scaling, a more computationally efficient method is clearly needed.
Note that the second scalable bit stream approach outlined above is far less computationally intensive as compared to the first, when the bit rate streams used in this technique serve as independent data elements and are not employed to augment the quality of higher bit rate bit streams. Simplified versions of the first scalable bit stream method can approach this lower complexity, however in this case only a limited number of bit stream bit rates can be supported. Although lower complexity has the benefit of real time operation, limited scalability range and resolution makes the simplified versions of these two approaches unsuitable for many applications. Clearly a new approach which provides for real time operation over a wide range of closely spaced scalable data rates is of great benefit.