The present invention relates generally to the field of perceptual audio coding (PAC) techniques and more particularly to a bit allocation scheme which achieves relatively consistent perceptual quality across consecutively coded frames.
In present state of the art audio coders for use in coding signals representative of, for example, speech and music, for purposes of storage or transmission, perceptual models based on the characteristics of the human auditory system are typically employed to reduce the number of bits required to code a given signal. In particular, by taking such characteristics into account, xe2x80x9ctransparentxe2x80x9d coding (i.e., coding having no perceptible loss of quality) can be achieved with significantly fewer bits than would otherwise be necessary. In such coders, typically known as perceptual audio coders, the signal to be coded is first partitioned into individual frames, with each frame comprising a small time slice of the signal, such as, for example, a time slice of approximately twenty milliseconds. Then, the signal for the given frame is transformed into the frequency domain, typically with use of a filter bank. The resulting spectral coefficients may then be quantized and coded. In particular, the quantizer which is used in a perceptual audio coder to quantize the spectral coefficients is advantageously controlled by a psychoacoustic model (i.e., a model based on the performance of the human auditory system), and by the specific number of bits that are available to code the given frame. An illustrative Perceptual Audio Coder (PAC) is described, for example, in U.S. Pat. No. 5,040,217, issued on Aug. 13,1991 to K. Brandenburg et al., and assigned to the assignee of the present invention. U.S. Pat. No. 5,040,217 is hereby incorporated by reference as if fully set forth herein.
Due to the nature of audio signals and the effects of the psychoacoustic model, the bit demand (i.e., the number of bits requested by the quantizer to code the given frame) typically varies with a large range from frame to frame. Therefore, it is invariably necessary to provide for a bit allocation scheme, which, inter alia, makes sure that the average bit rate remains relatively close to the desired bit rate (e.g., the bit rate of the channel over which the coded signal is ultimately to be transmitted, or the amount of available storage per frame if the coded signal is simply to be stored). In addition, the bit allocation scheme must ensure that the coder""s output xe2x80x9cbit bufferxe2x80x9d or xe2x80x9cbit reservoirxe2x80x9d (which provides the coder with the bits which are available) never runs empty (which is referred to as an underflow condition) or full (which is referred to as an overflow condition). (The use of a bit buffer or reservoir in audio coders is fully familiar to those of ordinary skill in the art.)
A typical prior art bit allocation scheme is described, for example, in U.S. Pat. No. 5,627, 938, issued on May 6, 1997 to J. Johnston, and assigned to the assignee of the present invention. U.S. Pat. No. 5,627, 938 is hereby incorporated by references as if fully set forth herein. Specifically, this prior art bit allocation scheme operates as follows. Each frame of the signal to be coded is initially coded with quantizer step sizes that are determined by a masked threshold which is computed by the psychoacoustic model. The masked threshold corresponds to a transparent coding quality. That is, setting the quantizer step sizes based on the masked threshold will, in general, provide for a coding which when reconstructed will sound (to the human ear) identical to the original signal.
Given the bit demand of the initially coded frame and the state of the bit buffer (i.e., the degree of xe2x80x9cemptinessxe2x80x9d or xe2x80x9cfullnessxe2x80x9d thereof), the bit allocation scheme decides how many bits are actually given to the quantizer to code the frame. That is, the bit allocator can be viewed as a controller which controls the number of bits allowed, given both the initial bit demand and the buffer state. Specifically, the quantizer step sizes are then modified in an attempt to match the allowed number of bits, and the frame is then re-coded with the modified step sizes, after which the bit allocator again makes a determination of the number of bits to actually be given to the quantizer. This process iterates until the frame is quantized and coded with a number of bits close to the number actually granted by the bit allocator. (This iterative process is referred to in the audio coding art as the xe2x80x9crate loop,xe2x80x9d and the processor which performs it is referred to as the xe2x80x9crate loop processor.xe2x80x9d)
Note that when the average bit demand of successive initially coded frames is either significantly higher or significantly lower than the average overall bit rate of the coder, the performance of this rate loop process is limited by the fact that the bit buffer necessarily has a substantial influence on the bit allocation. As such, the process fails to adequately account for the perceptual impact of the resulting bit allocation. In other words, the bit buffer becomes essentially the sole factor in the decision of how much the allocated number of bits diverge from the actual number of initially demanded bits.
To partially address this problem, prior art audio coders such as PAC employ what is known as a noise threshold, which exceeds the masked threshold by a predetermined amount. Typically, this results in an average bit demand which is closer to the desired bit rate. In this manner, the bit buffer state remains relatively well behaved (i.e., having a low risk of suddenly running empty or of overflowing), and the control task of the bit allocator becomes relatively straightforward.
Clearly, the bit demand of the noise threshold which results in an appropriate given range of average bit demand can be well below the bit rate which would be necessary to achieve transparency. Therefore, one disadvantage of having to use different noise thresholds for different target bit rates is the necessity of manually tuning the psychoacoustic model of the coder for each specific target bit rate, in order to achieve a reasonable level of efficiency and performance. However, since different types of audio signals result in significantly different bit demands, even providing for such a manual tuning process may not result in a coder that works well for all types of audio signals, or even one that works well for a single audio signal having characteristics which change over time. The typical result is that the coder provides a quality level which often varies significantly (over time), due to a failure of the bit allocator to allocate bits to consecutive frames in such a manner so as to ensure that they are coded with a relatively consistent quality level. In fact, this inconsistent behavior becomes more severe with increasing divergence between the target bit rate and the bit demand of the initially coded frames.
It has been realized that a more consistent perceptual quality over time provides for a far more pleasing auditory experience to the listener. In other words, significant variations in perceptual quality of a reconstructed audio signal is typically even more disconcerting to a listener than a reduced, but nonetheless consistent level of quality would be. It has also been realized that to provide a consistent perceptual quality over time, it is not sufficient to allow the bit allocation process to be controlled by merely the frame""s initial bit demand and the state of the bit buffer. Rather, in accordance with the principles of the present invention, the bit allocation process is further controlled by taking into account the characteristics of a plurality of frames and by analyzing the bit requirements of coding each of these frames at various levels of perceptual quality.
More specifically, the present invention provides a method (and apparatus) for coding an audio signal, the method comprising the steps of partitioning the audio signal into a sequence of successive frames; calculating one or more noise thresholds for each of a plurality of frames in the sequence, each noise threshold for a particular one of the frames corresponding to a different perceptual coding quality for the particular frame; estimating a bit demand for each of a corresponding one or more perceptual coding qualities for each frame, wherein each estimated bit demand comprises a number of bits which would be used to code a given frame at the corresponding perceptual coding quality; selecting one of the perceptual coding qualities for the coding of a particular frame based upon the estimated bit demand for the perceptual coding quality for the particular frame, and further based on one or more bit demands estimated for one or more other frames; and coding the particular frame based on the noise threshold corresponding to the selected perceptual coding quality for the particular frame. In particular, and in accordance with one illustrative embodiment of the present invention, the average bit demand for coding each of a plurality of frames at each of a plurality of different perceptual qualities is advantageously estimated, and based on these estimates, each frame is coded so as to maintain a relatively consistent perceptual quality from one frame to the next.