Conventionally, the goal of rate control in audio encoding, such as Advanced Audio Coding (AAC), has been to quantize a prescribed number of data samples (hereinafter referred to as “audio samples” obtained from audio signals, for example, frequency spectra obtained by time frequency transform by Modified Discrete Cosine Transform (MCDT), so that the quantized noise energy will not exceed the mask energy obtained by an audio psychological model. Simultaneously, the amount of coding needs to be controlled so that it will not exceed a fixed level, or the average bit rate, for example. ACC, by means of a scheme called a bit reserver, permits controls to maintain a fixed bit rate in long term by changing the bit rate in short term while maintaining a fixed level of quality to the maximum extent possible.
An issue in rate control by audio encoding is how to satisfy, or violate, the twin conflicting goals of ensuring that the quantized noise energy does not exceed the mask energy required by the audio psychological model and controlling the amount of encoding to below a fixed level. A standardized “optimal” rate control method does not exist. As an example, we explain the conventionally employed method of using a double loop, described in the Informative Part of the AAC Standards document. In the explanation that follows, audio codec is assumed to be AAC.
The quantization in ACC is performed according to the following procedure: Before band-by-band quantization, to shape the noise according to the amplitude, the frequency spectrum is transformed non-linearly. The non-linearly transformed frequency spectrum is divided into scale factor bands for which the range of masking effect is simulated, and the quantization is controlled on a band-by-band basis. The quantization of a scale factor band is referred to as a scale factor. The scale factor is controlled by a quantization scale that changes in increments of approximately 1.5 dB steps. The scale factors themselves are DPCM (Differential Pulse Code Modulation) encoded. The quantized value of each band is controlled to a fixed range ([−8191, +8191]) and it is entropy-encoded. According to the statistical characteristics of the distribution of quantized values, an optimal table can be selected from predetermined tables of entropy encoding. With respect to the band in which all quantization values are 0, the entropy coding of scale factors and quantization values can be omitted, thus saving codes.
In the conventional method, a double loop consisting of inner and outer loops is employed to determine a scale factor so that the amount of encoding will be less than the average bit rate. FIG. 16 shows a flowchart depicting an inner loop (rate control processing) according to the conventional method; FIG. 17 provides a flowchart explaining an outer loop (distortion control processing) according to the conventional method.
We now turn to the inner loop according to the conventional method, in reference to FIG. 16. First, the amount of encoding is calculated using the scale factor that is given for each band (S101). Next, a determination of whether the amount of encoding is less than the average bit rate is made (S102). If it is determined that the amount of encoding is greater than the average bit rate, the scale factors for all bands are increased (S103), and the processing returns to S101. If the amount of encoding is judged to be less than the average bit rate, the processing ends.
We now explain the outer loop according to the conventional method, in reference to FIG. 17. First, the scale factor is initialized (S111). For example, the scale factor is initialized so that it is at a minimum, that is, it is quantized to the finest value. Next, calling the inner loop (S112), the noise energy is calculated for each band (S113). Specifically, an inverse-quantized spectrum is determined and noise energy is calculated for each band. The method involving the determination of noise by inverse quantization is referred to as Analysis by Synthesis (AbS). Further, for a band that is greater than the mask energy determined by auditory psychoanalysis, the scale factor is reduced, and the quantization is made finer (S114). If the ratio between noise energy and mask energy is designated as NMR (Noise-to-Mask Ratio), the condition that minimizes the scale factor will be NMR>1.
A determination is made as to whether the scale factors for all bands have been changed (S115). If it is determined that changes have not been made, a determination is made as to whether scale factors for any bands have not been changed (S116). If it is determined in Step S116 that there is a band for which the scale factor has been changed, the processing returns to Step S112. If it is determined in Step S115 that scale factors were changed for all bands or if it is determined in Step S116 that scale factors for any bands have not been changed, the scale factors are restored (S117).