1. Field of the Invention
The present invention relates generally to a decoder which decodes encoded audio data. More particularly, the invention relates to an improvement on an audio decoder which controls encoded audio data stored in a buffer.
2. Description of the Related Art
Personal computers, as well as business and home entertainment systems, which handle a vast amount of and various types and forms of multimedia information, should process digitally recorded video and audio information at a fast speed. Such fast information processing can be achieved by data compression and expansion techniques, which directly affect the processing speed. The "MPEG" standards are one of such data compression and expansion techniques to improve the processing speed. The current MPEG standards are undergoing standardization by the MPEG Committee (ISO/IEC JTC1/SC29/WG11) under the ISO (International Organization for Standardization)/IEC (International Electrotechnical Commission).
The MPEG standards consist of three parts. Part 1 (ISO/IEC IS 11172-1: MPEG system part) defines the multiplex structure of video data and audio data and the synchronization system. Part 2 (ISO/IEC IS 11172-2: MPEG video part) defines the high efficiency coding system for video data and the format for video data. The part 3 (ISO/IEC IS 11172-3: MPEG audio part) defines the high efficiency coding system for audio data and the format for audio data.
Video data that is handled with respect to an MPEG video part includes moving pictures each consisting of several tens of (e.g., 30) frames per second. The video data has a six-layer structure of a sequence including a plurality of Groups Of Pictures (GOP's), GOP's each including a plurality of pictures, a plurality of slices in each picture, a plurality of macroblocks in each slice and a plurality of blocks in each macroblock.
At present, there are two MPEG standards, MPEG-1 and MPEG-2, that mainly differ in the encode rate at which video and audio data are encoded. In MPEG-1, frames correspond to pictures. In MPEG-2, either a frame or a field corresponds to a picture. Two fields constitute one frame. The structure where a frame corresponds to a picture is called a frame structure, while the structure where a field corresponds to a picture is called a field structure.
In MPEG, a compression technique called intra-frame prediction is employed. Intra-frame prediction compresses intra-frame data based on a chronological correlation among frames. Intra-frame prediction includes bidirectional prediction. Bidirectional prediction uses both forward prediction for predicting a current reproduced image (or picture) from an old reproduced image (or picture) and backward prediction for predicting a current reproduced image from a future reproduced image.
Bidirectional prediction uses I (Intra-coded) picture, P (Predictive-coded) picture and B (Bidirectionally-coded) picture. An I-picture is produced independently irrespective of old and future reproduced images. A P-picture is produced by forward prediction (prediction from an old decoded I- or P-picture). A B-picture is produced by bidirectional prediction. In bidirectional prediction, a B-picture is produced by one of the following three predictions.
(1) Forward Prediction: prediction from an old decoded I- or P-picture.
(2) Backward Prediction: prediction from a future decoded I- or P-picture.
(3) Bidirectional Prediction: prediction from old and future decoded I- or P-pictures.
An I-picture is produced without an old picture or a future picture, whereas every P-picture is produced by referring to an old picture and every B-picture is produced by referring to an old or future picture.
In intra-frame prediction, an I-picture is periodically produced first. Then, a frame several frames ahead of the I-picture is produced as a P-picture. This P-picture is produced by the prediction in one direction from the past to the present (forward direction). Next, a frame located before the I-picture and after the P-picture is produced as a B-picture. At the time this B-picture is produced, the optimal prediction scheme is selected from among forward prediction, backward prediction and bidirectional prediction. In general, a current image and its preceding and succeeding images in consecutive motion pictures are similar to one another and that they differ only partially. In this respect, it is assumed that the previous frame (e.g., I-picture) and the next frame (e.g., P-picture) are substantially the same. If there is a slight difference (B-picture data) between both frames, this difference is extracted and compressed. Accordingly, the intra-frame data can be compressed based on the chronological correlation among consecutive frames.
A series of video data encoded according to the MPEG video standards in the above manner is called an MPEG video bit stream. A series of audio data encoded according to the MPEG audio standards is called an MPEG audio bit stream. The video and audio stream are time-divisionally multiplexed according to the MPEG system part to generate an MPEG system bit stream.
MPEG-1 is mainly associated with storage media such as a CD (Compact Disc), a CD-ROM (Compact Disc-Read Only Memory) and a DVD (digital video disk), while MPEG-2 includes the MPEG-1 and is used in a wide range of applications.
MPEG audio has three modes, namely, layer I, layer II and layer III; a higher layer can achieve a higher sound quality and higher compression ratio. An audio stream has a plurality of frames each called AAU (Audio Access Unit). Each AAU is the minimum independently decodable unit and includes a given number of pieces of sample data for each layer. The layer I has 384 pieces of sample data, and the layers II and III have 1152 pieces of sample data.
The AAU format has a header at the top, followed by an optional error check code (CRC: Cyclic Redundancy Code--16 bits) and audio data. The fields from the header to the audio data are used to reproduce an audio signal. The header defines the sampling frequency, which is a field to specify the sampling rate and is selected from among three frequencies (32 KHz, 44.1 KHz and 48 KHz). Audio data is a variable length data. When the end of audio data does not coincide with the end of the AAU, the remaining portion of the AAU (or the gap portion from the end of the audio data to the end of the AAU) is called "ancillary data". It is possible to insert any data other than MPEG audio into this ancillary data. In the MPEG-2, multichannel data and multilingual data are inserted in the ancillary data.
Audio data belonging to the layer I includes an allocation field, scale factor field and sample field. Individual audio data belonging to the layers II and III include an allocation field, scale factor select information, scale factor field and sample field.
The scale factor indicates the magnification when a waveform is reproduced for each subband and each channel. The scale factor is expressed by six bits in association with each subband and each channel, and can indicate the magnification in units of approximately 2 dB over a range of +6 dB to -118 dB. The value of a scale factor corresponds to the sound pressure level of a sound to be reproduced. Therefore, a scale factor value equal to or smaller than a certain value indicates that the reproduced sound has a sound pressure level inaudible by people (i.e., no sound).
In the MPEG audio, the human audio characteristic (audio psychological model) including the masking effect and minimum audible limit characteristic is used. The masking effect is such that when a large sound is produced at a certain frequency, a sound, the frequency of which is close to that certain frequency and the level of which is equal to or below a certain level, becomes inaudible or is difficult to hear. The minimum audible limit characteristic defines a given frequency characteristic such that human ears are most sensitive to a band of human voices of several hundreds of Hz and cannot hear sounds whose levels are equal to or lower than a certain sound pressure level in an ultra low frequency range or an ultra high frequency range.
To compress audio data, first an MPEG audio encoder divides a received audio signal to 32 subbands using a band split filter. The encoder then utilizes the masking effect and minimum audible limit characteristic to quantize individual split audio signals in such a manner that no bits are assigned to sounds that have become inaudible by the masking. This quantization reduces the amount of information for data compression. More specifically, the masking effect and minimum audible limit characteristic are combined to set the mask level that indicates a dynamic change together with an audio signal and a signal equal to or below the mask level is subjected to data compression. As a result, the layer I indicates the compression effect with an encode rate of 192 K, 128 Kbps and a compression ratio of 1/4 and can have a sound quality equivalent to that of CD-DA (CD Digital Audio) and PCM (Pulse Code Modulation). The layer II indicates the compression effect with an encode rate of 128 K, 96 Kbps and a compression ratio of 1/6 to 1/8 and can have a sound quality equivalent to that of MD and DCC. The layer III indicates the compression effect with an encode rate of 128 K, 96K, 94 Kbps and a compression ratio of 1/6 to 1/12.
FIG. 1 is a block diagram indicating a conventional MPEG audio decoder 301. The MPEG audio decoder 301 has a bit buffer 302 and a decode core circuit 303. The bit buffer 302 is a ring buffer which has a RAM (Random Access Memory) with the FIFO (First-In-First-Out) structure, and sequentially stores audio streams transferred from an external device (recording medium like a video CD or DVD, an information processing device like a personal computer or the like). The decode core circuit 303 decodes a plurality of AAUs (frames) included in an audio stream in conformity to the MPEG audio part to thereby produce a compressed audio stream.
The decode core circuit 303 includes a dequantizer 304, a band synthesizer 305, a PCM output circuit 306 and a control circuit 307. The control circuit 307 detects the header affixed to the top of each AAU included in the audio stream stored in the bit buffer 302. Based on the detected header, the control circuit 307 controls the bit buffer 302 in such a way that an audio stream is read out for each AAU. The control circuit 307 detects the previously defined sampling frequency from the header, and produces a pipeline signal having pulses corresponding to the detected sampling frequency. The operations of the dequantizer 304, the band synthesizer 305 and the PCM output circuit 306 are controlled in accordance with this pipeline signal. The individual units 304 to 306 have operation speeds corresponding to the pipeline signal.
The dequantizer 304 performs dequantization, the opposite process to that of the encoder, on each AAU read from the bit buffer 302 to produce a dequantized AAU. The band synthesizer 305 receives the dequantized AAU from the dequantizer 304 and performs a product-sum operation called "butterfly operation" to combine individual pieces of audio data, which has been split to 32 subbands. As a result, decoded audio data is acquired. The PCM output circuit 306, which comprises an output interface and cross attenuator, receives decoded audio data from the band synthesizer 305 and produces an audio signal (PCM output signal). A D/A converter (not shown) performs D/A conversion of the audio signal. An audio amplifier (not shown) amplifies the analog audio signal so that sounds are reproduced from a loudspeaker.
The bit buffer 302 may overflow if the bit rate of the audio stream transferred from an external device is greater than the specified value. When an overflow occurs, the bit buffer 302 comprised of a ring buffer overwrites the previously stored audio stream with a newly input audio stream. This destroys the audio stream which has been previously stored in the bit buffer 302 resulting in data loss. Consequently, no sounds can be reproduced from the lost audio stream, causing sound skipping in the reproduced sound. This sound skipping is uncomfortable to the user's ears.
In the following cases, the bit rate of an audio stream becomes greater than the specified value.
Case 1: When sounds are reproduced faster than the normal (standard) playback speed. Fast playback is used when the user wants to perform fast forward playback to listen to sounds in a short period of time using a recording medium as an external device or when the user wants to perform fast forward playback or fast rewind playback to search for the desired sounds.
Case 2: When an information processing device is used as an external device. An information processing device like a microcomputer should not necessarily encode an audio stream in conformity to the standards. Therefore, the bit rate of an audio stream may come off the specified range. For recording media like a video CD and DVD, the bit rate of an audio stream is set in conformity to the MPEG audio part.
Japanese Unexamined Patent Publication No. 7-307674 discloses a decoder which raises the transfer rate (bit rate) of input data and increases the data processing speed to decode data instantaneously on second column, lines 40 to 46. This publication further teaches on the eighth column, line 29 to the ninth column, line 11 that data to be supplied to the decoder can be thinned out by controlling data writing into the FIFO memory.