The invention relates to an MP3 decoder, and more specifically, to methods and apparatuses of bit stream decoding and memory optimization.
MP3, MPEG-1/AudioLayer-III, is a high compression digital audio format. An MP3 device decodes data stored in a digital storage media. Audio data is typically compressed in accordance with the features of the human auditory system. These features are typically referred to as volume, pitch, and masking effect. Volume is a measure of the strength of the sound. The hearing sensitivity of the human ear varies greatly with the frequency of the sound. A human for example, is more sensitive to audio signals with a frequency range between 2000 and 4000 Hz (2 KHz˜4 KHz), whereas signals with a much lower or much higher frequency require a higher volume (or larger signal amplitude) to be heard. Pitch is generally measured in frequency, and the audible range is approximately from 20 Hz to 20 KHz. The masking effect is induced when a particular frequency band obstructs another frequency band. The masking effect can be generally divided into frequency masking and time masking.
An MP3 device decodes compressed data to recover the compressed digital signal restoring it to the original audio signal. FIG. 1 is a block diagram illustrating an MP3 decoder. A synchronizing and error checking module 100 receives audio digital data, wherein the digital data is carried by a bit stream 101 including a plurality of frames. The synchronizing and error checking module 100 performs authentication and decoding of the bit stream 101, searches for the start and finish address of each frame, and performs error checking. If an MP3 bit stream 101 contains self-defined auxiliary data 103, the module 100 outputs the auxiliary data 103 directly without decoding. Huffman decoding module 102, side information decoding module 104, and scale factor decoding module 106 decode corresponding information retrieved from the synchronizing and error checking module 100 respectively. Decoding modules 102, 104, and 106 are described in detail later. The decoded data is then passed to a re-quantization module 108. The function of the re-quantization module 108 includes reconstructing the frequency lines generated by the encoder. The function of the frequency line reorder module 110 includes examining if the sub-band comprises short windows. If short windows are present, the data order is reassembled according to the output order of the encoder. A stereo processing module 112 receives the frequency lines from the frequency reassembly module 110, and the stereo processing module 112 recovers the left and right audio signals from the encoded audio signal. The audio signal is divided into left and right channels, and is processed in parallel. The processing modules of the decoder include alias reconstruction modules 114a and 114b, IMDCT modules 116a and 116b, frequency inversion modules 118a and 118b, and combining multi-phase filters 120a and 120b. The alias reconstruction modules 114a and 114b reconstruct the audio signals through mixing to cancel the anti-alias effect induced in the encoder. The inverse modified discrete cosine transform (IMDCT) modules 116a and 116b transform the frequency lines into multi-phase filter sub-band samples. The frequency inversion modules 118a and 118b compensate the frequency inversion by multiplying the samples of the odd sub-bands by −1. The combined multi-phase filters 120a and 120b calculate successive audio samples, and output the left channel 107 and right channel 105 respectively.
As shown in FIG. 2, a frame in the MP3 bit stream includes a header 200, a cyclic redundancy check (CRC) code 202, side information 204, a main data zone 206, and auxiliary data 208. The header 200 of the frame has 32 bits of data, which includes 12 synchronization bits. The synchronizing and error checking module 100 of FIG. 1 determines the position of each frame by searching the 12 synchronization bits, and detects errors according to the 16 bits CRC code. The side information 204 carries the information required for information selection and scale factor reconstruction by Huffman decoding. MP3 employs the bit reservoir technique, so that the side information 204 also includes the information for indicating the start position of the main data. The length of the side information is either 136 bits for a mono channel, or 256 bits for a stereo channel. The main data zone 206 includes the coded scale factor and data after Huffman encoding. The length of the main data in each frame is variable in accordance with the variable length Huffman code. If there is an available bit reservoir in the main data zone 206 of a frame, the main data of subsequent frames can be stored therein. In another aspect, the main data of a frame can be segmented into portions, which can be individually stored in the main data zone 206 of multiple frames. The start position of the main data can be determined by reading the bit index data in the side information 204. The main data zone 206 is divided into two granules, wherein a granule includes only one channel in a mono audio mode, and a granule includes two channels in a stereo mode. Each channel comprises a scale factor and Huffman code. The Huffman code in a channel corresponds to 576 frequency lines. The end of the frame is auxiliary data 208, and the format of the auxiliary data 208 is defined by the user. The MP3 decoder outputs the auxiliary data 208 without decoding or performing any data processing.
The length of the Huffman code is variable as previously mentioned, but the length of an MP3 frame is fixed. The MP3 frame allows the main data of a subsequent frame to be stored in the bit reservoir of a preceding frame. The side information of a frame includes 9 unsigned bits of main_data_end parameter indicating the start position of the main data in the current frame. The parameter main_data_end indicates the forward shift (in number of bytes) of the main data from the header of the current frame. If the parameter main_data_end exceeds the length of a frame, the header, CRC, and side information of the crossed preceding frames are not counted in the amount of forward shift. The shortest length of a frame is 96 bytes, thus the main data zone is at least 58 bytes as the data not counted in the shift amount is at most 38 bytes. The greatest value of the 9 bits main_data_end is 512, so that the maximum amount of shift is 512 bytes, which is equivalent to a forward shift of up to 9 frames.
Typically, an MP3 decoder requires a 7680 bit (960 bytes) first in first out (FIFO) buffer for storing the remaining decoded data of the current frame, wherein the remaining decoded data may be the main data of subsequent frames and the auxiliary data of the current frame. The decoder reads data from the bit stream when finished reading the data stored in the buffer. The operation of a decoder reading an MP3 bit stream 3 utilizing a buffer 36 is illustrated in FIG. 3. The decoder sequentially reads and decodes the bit stream 3 from the header 301 of frame 30. When the decoder reads the side information 302 of frame 30, the parameter main_data_end is 0, indicating that the main data 303 of frame 30 immediately follows the side information 302. Decoding of the main data 303 of frame 30 is completed when the decoder decodes data to point A, the remaining data 304-307 of frame 30 is written to the buffer 36. The decoder then reads the header 321 and the side information 322 of frame 32 from the bit stream 3. The parameter main_data_end of frame 32 refers to point B, indicating that the main data 305 of frame 32 is at point B in frame 30. Point B reflects to the buffer 36 as shown by the dashed line in FIG. 3. Data between point A and point B is the auxiliary data 304. The decoder then reads the data from the buffer 36, and determines of reading of the main data 305 of frame 32 is complete upon reaching point C. Some data from frame 30 and the remaining data 323 of frame 32 will remain in the buffer 36. The decoder then reads the header 341 and the side information 342 of frame 34 from the bit stream 3. The parameter main_data_end of frame 34 refers to point D, and since point D is reflected to the buffer 36, the decoder will read the data from the buffer 36. The decoder finishes reading the buffer 36 upon reaching point E, and reads data from the bit stream 3 until reaching point F. The decoder thus reads the main data 307, 323, 343 of frame 34, the remaining data of frame 34 is stored in the buffer 36, and the data decoding is processed in the same way as previously described.
It can be seen from the previous description that the data stored in the buffer requires an extra writing operation (writing to the buffer) compared to data not stored in the buffer, as well as an extra reading operation (reading from the buffer).
After Huffman decoding the main data of the MP3 bit stream, frequency lines representing strength of the compressed audio in each frequency are retrieved. A set of 576 frequency lines can be generally divided into three zones, from low frequency to high frequency, the three zones includes a first zone (usually referred to as big-values) 40, a second zone (usually referred to as count1) 42, and a third zone (usually referred to as rzero) 44. The boundaries of the three zones are designated by the side information. Humans are more sensitive to sound with a frequency range from 2 KHz to 4 KHz, typically referred to as low frequency in the audible range, thus the corresponding zone (big-values) 40 usually contain large values. High frequency audio is not easily heard by the human ear, thus successive zero values are present in the high frequency zone (rzero) 44.
During Huffman decoding, the boundary of rzero zone 44 is determined and the decoder inserts the appropriate number (r) of zeros in the rzero zone 44. The data processing after Huffman decoding, such as re-quantization, stereo processing, alias reconstruction, IMDCT, however, require an additional r reading operations and r writing operations, thus suffers decoding inefficiency.