Perceptual Transform Coding
With the introduction of portable digital media players, the compact disk for music storage and audio delivery over the Internet, it is now common to store, buy and distribute music and other audio content in digital audio formats. The digital audio formats empower people to enjoy having hundreds or thousands of music songs available on their personal computers (PCs) or portable media players.
One benefit of digital audio formats is that a proper bit-rate (compression ratio) can be selected according to given constraints, e.g., file size and audio quality. On the other hand, one particular bit-rate is not able to cover all scenarios of audio applications. For instance, higher bit-rates may not be suitable for portable devices due to limited storage capacity. By contrast, higher bit-rates are better suited for high quality sound reproduction desired by audiophiles.
When audio content is not at a suitable bit-rate for the application scenario (e.g., when high bit-rate audio is desired to be loaded onto a portable device or transferred via the Internet), a way to change the bit-rate of the audio file is needed. One known solution for this is to use a transcoder, which takes one compressed audio bitstream that is coded at one bit-rate as its input and re-encodes the audio content to a new bit-rate.
FIG. 1 illustrates a simple and widely-used approach to transcoding called “decode-and-encode” (DAE) transcoding. In this approach, a full decoding of a compressed bitstream (B) 105 having an original coding bit-rate is performed by a decoder 110. This produces a reconstruction of the original audio signal content as decoded audio samples 115. The decoded audio samples are then fully re-encoded by an encoder 120 to produce a compressed bitstream (B′) 135 with a target bit-rate. However, this approach often leads to high computational complexity due to performing the full encoding. In addition, the approach results in degraded audio quality compared to a one-time encoding at the same target bit-rate from the original audio source since the transcoder does not have the original audio source available.