Proposed systems for providing digital audio broadcasting are expected to provide near compact disk (CD)-quality audio, data services and more robust coverage than existing analog FM transmissions. Digital audio broadcasting systems compress an audio signal using a digital audio encoder, such as a perceptual audio coder (PAC). Perceptual audio coders reduce the amount of information needed to represent an audio signal by exploiting human perception and minimizing the perceived distortion for a given bit rate. Perceptual audio coders are described, for example, in D. Sinha et al., “The Perceptual Audio Coder,” Digital Audio, Section 42, 42-1 to 42-18, (CRC Press, 1998), incorporated by reference herein. Generally, the amount of information needed to represent an audio signal is reduced using two well-known techniques, namely, irrelevancy reduction and redundancy removal. Irrelevancy reduction techniques attempt to remove those portions of the audio signal that would be, when decoded, perceptually irrelevant to a listener. This general concept is described, for example, in U.S. Pat. No. 5,341,457, entitled “Perceptual Coding of Audio Signals,” by J. L. Hall and J. D. Johnston, issued on Aug. 23, 1994, incorporated by reference herein.
FIG. 1 illustrates a conventional audio communication system 100. As shown in FIG. 1, the communication system 100 employs a radio transmission link 130 that is typically of a fixed bit rate. The bit rate of the audio encoder 110, on the other hand, is typically variable, depending on the complexity of the current audio signal and the audio quality requirements. On average, the bit rate of the audio encoder 110 is equal to or less than the capacity of the transmission link 130, but at any given instance the bit rate of the audio coder 110 may be higher. If data from the audio encoder 110 was applied directly to the transmission link 130, data would be lost each time the instantaneous bit rate of the encoder 110 exceeded the capacity of the transmission link 130. In order to prevent such a loss of data, the output of the encoder 110 is buffered into a first-in-first-out (FIFO) buffer 120 before being applied to the transmission link 130. If the instantaneous bit rate of the encoder 110 is higher than the bit rate of the transmission link, the amount of data in the FIFO buffer 120 increases. Similarly, if the instantaneous bit rate of the encoder 110 is lower than the bit rate of the transmission link 130, the amount of data in the FIFO buffer 120 decreases.
As shown in FIG. 1, the encoder 110 includes a buffer control logic element 115 that modifies the bit rate of the encoder 110 and prevents the encoder 110 from overflowing or underflowing the FIFO buffer 120. Overflow causes a loss of bits, while an underflow wastes some of the capacity the transmission link 130. The buffer control logic element 115 determines for each frame the number of bits, Md[k], that the audio encoder 110 can use to encode the frame, based on the current level, l[k], of the buffer 120. The encoder 110 iteratively encodes the frame until the number of bits used is close to the number of allocated bits, Md[k].
As a result of this scheme, the transmission delay is also variable. The delay between the time when an audio packet is first written into the FIFO buffer 120 and the time when the packet is actually received by the receiver 150 depends, among other factors, on the amount of data that is currently stored in the FIFO buffer 120. However, the audio decoder 170 at the receiver 150 needs to get audio packets at a fixed rate (of packets per second) in order to play continuously. Therefore, it is necessary to buffer the audio data at the decoder 170 by using a buffer 160. The decoder input-buffer 160 has to have enough capacity so that even in the worst case of minimal delay and largest packet size, the buffer 160 will not overflow. In addition, the initialization period has to be sufficiently long to accumulate enough packets in the buffer 160 so that the buffer does not become empty due to transmission delays.
Due to the nature of audio signals and the effects of the psychoacoustic model employed by the perceptual audio coder 110, the bit rate (i.e., the number of bits requested by the quantizer to code the given frame) typically varies with a large range from frame to frame. Thus, the encoder 110 employs a bit allocation scheme that ensures that the average bit rate remains relatively close to the desired bit rate and that the buffer 120 does not overflow (when the buffer is full) or underflow (when the buffer runs empty). Given the bit demand of the initially encoded frame and the state of the buffer 120, the bit allocation scheme decides how many bits are actually given to the quantizer (not shown) to code the frame. Specifically, the quantizer step sizes are then modified in an attempt to match the allowed number of bits, Md[k], and the frame is then re-coded with the modified step sizes, after which the bit allocator again makes a determination of the number of bits to actually be given to the quantizer. This process iterates until the frame is quantized and coded with a number of bits sufficiently close to the number actually granted by the buffer control logic element 115.
Perceptual audio coders quantize the spectral components of an audio signal such that the quantization noise follows a noise threshold determined by the perceptual model. With this approach, the bit demand which results in an appropriate range of average bit demand can be well below the bit rate that would be necessary to achieve transparency. Therefore, one disadvantage of having to use different noise thresholds for different target bit rates is the necessity of manually tuning the psychoacoustic model of the perceptual audio coder 110 for each specific target bit rate, in order to achieve a reasonable level of efficiency and performance. However, since different types of audio signals result in significantly different bit demands, even providing for such a manual tuning process may not result in a perceptual audio coder 110 that works well for all types of audio signals, or even one that works well for a single audio signal having characteristics that vary over time. The typical result is that the perceptual audio coder 110 provides a quality level which varies significantly over time, due to a failure of the buffer control logic element 115 to allocate bits to consecutive frames in such a manner so as to ensure that they are coded with a relatively consistent quality level.
U.S. patent application Ser. No. 09/477,314, filed Jan. 4, 2000, entitled “Perceptual Audio Coder Bit Allocation Scheme Providing Improved Perceptual Quality Consistency,” discloses a bit rate control technique that partitions an audio signal into successive frames, and estimates a bit rate for each of a plurality of preselected distortion levels. Generally, the estimated bit rate that is closest to the desired bit rate, Md[k], and provides an acceptable level of distortion is selected. Thus, the disclosed buffer control technique employs a bit allocation scheme that considers the characteristics of a plurality of frames and analyzes the bit requirements of coding each of these frames at various levels of perceptual quality. The disclosed buffer control technique provides a relatively consistent perceptual quality from one frame to the next, with an acceptable bit rate for the communication system.
For broadcasting applications, the desired end-to-end delay is limited by the cost of the decoder and the tune-in time, i.e., the time it takes between a request for playback and the time when the audio actually plays back. Therefore, a need exists for an improved buffer control technique that minimizes the variation in the distortion for a given limited buffer size.