The invention relates to MP3 decoding, and more specifically, to methods and apparatuses of memory optimization and pipeline processing used in MP3 decoding.
MP3, MPEG-1/AudioLayer-III, is a high compression digital audio format. An MP3 device decodes data stored in digital storage media. Audio data is usually compressed in accordance with human hearing capabilities, with features are usually referred to as volume, pitch, and masking effect. Volume is a measure of the strength of the sound. Hearing sensitivity for humans varies greatly with the frequency of the sound, for example, width more sensitivity to audio signals with frequency between 2000 and 4000 Hz (2 KHz˜4 KHz), whereas signals with a much lower or much higher frequency require a higher volume (or larger signal amplitude) to be audible. Pitch is generally measured in frequency, with audible range approximately from 20 Hz to 20 KHz. Masking effect occurs when the sound of a particular frequency band obstructs that of another frequency band, and is generally divided into frequency masking and time masking.
An MP3 device decodes compressed data to form a compressed digital signal to its original audio signal. FIG. 1 is a block diagram illustrating an MP3 decoder. A synchronizing and error checking module 100 receives audio digital data, carried by a bitstream 101 comprising a plurality of frames. The synchronizing and error checking module 100 authenticates and decodes the bitstream 101, searches the starting and finishing address for each frame, and checks for errors. If an MP3 bitstream 101 contains self-defined ancillary data 103, the module 100 outputs the ancillary data 103 directly without decoding. Huffman decoding module 102, side information decoding module 104, and scale factor decoding module 106 decode corresponding information retrieved from the synchronizing and error checking module 100 respectively. These decoding modules 102, 104, and 106 are later described in detail. The decoded data is then passed to a re-quantization module 108. The function of the re-quantization module 108 includes reconstruction of the frequency lines generated by the encoder. The frequency line reorder module 110 determines if the sub-band comprises short windows. If so, the data is reassembled according to the output order of the encoder. A stereo processing module 112 receives the frequency lines from the frequency reassembly module 110, and the stereo processing module 112 recovers the left and right audio signals from the encoded audio signal. The audio signal is divided into left and right channels, and processed in parallel. The processing modules of the decoder include alias reconstruction modules 114a, 114b, IMDCT modules 116a, 116b, frequency inversion modules 118a, 118b, and multi-phase filters 120a, 120b. The alias reconstruction modules 114a and 114b reconstruct the audio signals by mixing to cancel the anti-alias effect induced in the encoder. The inverse modified discrete cosine transform (IMDCT) modules 116a, 116b convert the frequency lines into multi-phase filter sub-band samples. The frequency inversion modules 118a, 118b compensate for the frequency inversion by multiplying the samples of the odd sub-bands by −1. The multi-phase filters 120a, 120b calculate successive audio samples, and output the left channel 107 and right channel 105 respectively.
As shown in FIG. 2, a frame in the MP3 bitstream includes a header 200, a cyclic redundancy check (CRC) code 202, side information 204, a main data zone 206, and ancillary data 208. The header 200 of the frame has 32 bits of data, including 12 synchronization bits. The synchronizing and error checking module 100 of FIG. 1 determines the position of each frame by searching the 12 synchronization bits, and detects errors according to the 16-bit CRC code. The side information 204 provides information selection and scale factor reconstruction in Huffman decoding. MP3 employs bit reservoir technique, such that the side information 204 also includes information indicating the starting position of the main data. The length of the side information is either 136 bits for mono audio channel, or 256 bits for stereo channel. The main data zone 206 includes the coded scale factor and data after Huffman encoding. The length of main data in each frame is variable in accordance with the Huffman code. If there is an available bit reservoir in the main data zone 206 of a frame, the main data of subsequent frames is stored therein. Main data of a frame can also be segmented, and these portions can be individually stored in the main data zone 206 of many frames. The starting position of the main data can be determined by reading the bit index data from the side information 204. The main data zone 206 is divided into granules including only one channel in a mono audio mode, and granules including two channels in stereo modes. Each channel comprises a scale factor and Huffman code. The Huffman code in a channel corresponds to 576 frequency lines. The end of the frame is ancillary data 208, with the format of the ancillary data 208 is defined by the user. The MP3 decoder outputs the ancillary data 208 without decoding or performing any data processing.
After Huffman decoding of the main data of the MP3 bitstream, frequency lines representing strength of the compressed voice in each frequency are retrieved. A set of 576 frequency lines can be generally divided into a first zone (usually referred to as big-values) 40, a second zone (usually referred to as count1) 42, and a third zone (usually referred to as rzero) 44. The boundaries of the three zones are designated by the side information. Human is more sensitive to frequencies range from 2 KHz to 4 KHz, which is referred to as low frequency in the hearing range, thus the corresponding zone (big-values) 40 usually contains large values. High frequencies are not easily audible, thus there are successive zero values in the high frequency zone (rzero) 44.
During Huffman decoding, the boundary of rzero zone 44 is determined and the decoder inserts the appropriate number (r) of zeros therein. Data processing after Huffman decoding, such as re-quantization, stereo processing, alias reconstruction, and IMDCT, however, requires additional r read operations and r write operations, reducing decoding efficiency.
The inverse modified discrete cosine transform (IMDCT) modules 116a and 116b, and multi-phase filters 120a and 120b occupy most of the computational time in the MP3 decoder. According to the block diagram of FIG. 1, the MP3 decoder only starts to process subsequent granules after generating left and right channels from the current granule. The processing speed required for the MP3 device is high in order to achieve the desirable audio output. Methods for increasing MP3 decoding rate are therefore widely sought.