In recent years, high-sound quality, high-efficiency audio signal encoding techniques are popularly used in audio tracks of DVD-Video, portable audio players, music delivery, music storage in a home server of a home LAN, and the like, and have prevailed, gaining significant importance.
Most of such audio signal encoding techniques execute a time-frequency transform by exploiting transform coding techniques. For example, MPEG-2 AAC, Dolby Digital (AC-3), and the like form a filter bank by orthogonal transform alone such as MDCT (Modified Discrete Cosine Transform) or the like. Also, MPEG-1 Audio Layer III (MP3) and ATRAC (an encoding scheme used in an MD (MiniDisc)) form a filter bank by using a cascade of a subband filter such as QMF (Quadrature Mirror Filter) and an orthogonal transform.
These transform coding techniques make masking analysis by exploiting a perceptual property of human. By removing spectrum components which are determined to be masked or allowing masked quantization errors, an information amount for spectral expression is reduced, thus enhancing the compression efficiency.
These transform coding techniques compress an information amount of a spectrum by nonlinearly quantizing spectrum components. For example, MP3 and AAC compress the information amount by raising respective spectrum components to the power of 0.75.
These transform coding techniques combine input signals transformed into frequency components by the filter bank for respective decomposed frequency bands set based on the frequency resolution of the human auditory sensitivity. Then, an information amount is reduced by determining normalization coefficients for respective decomposed frequency bands based on auditory analysis result upon quantization, and expressing frequency components by combinations of the normalization coefficients and quantized spectrum. This normalization coefficient is a variable used to adjust a quantization coarseness for each decomposed band in practice. When the normalization coefficient changes by 1, the quantization coarseness changes by one step. MPEG-2 AAC calls this decomposed frequency band a scale factor band (SFB), and calls the normalization coefficient a scale factor.
These transform coding schemes control the code amount by controlling the quantization coarseness of one entire frame as an encoding unit. In many transform coding schemes, the quantization coarseness is controlled stepwise with a width of a given radix raised to the power of an integer, and this integer is called a quantization step. In the MPEG audio standard, a quantization step that sets the quantization coarseness of the entire frame is called “global gain” or “common scale factor”. Also, by expressing the aforementioned scale factor as a relative value to the quantization step, an information amount required for the codes of these variables is reduced.
For example, in MP3 and AAC, when these variables change by 1, the actual quantization coarseness changes by 2 raised to the power of 3/16.
In the quantization processing of the transform coding scheme, the scale factor is controlled to control quantization distortion, so as to mask quantization errors by reflecting the result of auditory arithmetic operations. At the same time, the code amount of the entire frame must be controlled to control the quantization step so as to adjust the quantization coarseness of the entire frame as needed. Since these two different types of numerical values that determine the quantization coarseness exert an important influence on encoding quality, these two different control processes are required to be carefully and accurately done at the same time and with high efficiency.
The written standards (ISO/IEC 11172-3) of MPEG-1 Audio Layer III (MP3) and those (ISO/IEC 13818-7) of MPEG-2 AAC announce a method of executing repetitive processing by means of double loops including a distortion control loop (outer loop) and code amount control loop (inner loop) as a method of controlling the scale factor and global gain upon quantization as needed. This method will be described below with reference to the drawings. Note that the following description will be given taking the case of MPEG-2 AAC as an example for the sake of convenience.
FIG. 19 is a simple flowchart of quantization processing described in the ISO/IEC written standards.
In step S501, the scale factors and global gain of all SFBs are initialized to zero and the process enters a distortion control loop (outer loop).
In the distortion control loop, a code amount control loop (inner loop) is executed first.
In the code amount control loop, in step S502 1024 spectrum components for one frame are quantized according to the following quantization equation:
                              X          q                =                  Int          ⁡                      [                                                            [                                                                                                          x                        i                                                                                    ·                                          2                                                                                                    -                            1                                                    /                          4                                                ·                                                  (                                                      global_gain                            ·                            scalefac                                                    )                                                                                                      ]                                                  3                  /                  4                                            +              0.4054                        ]                                              (        1        )            
where Xq is the quantized spectrum, xi is the spectrum (MDCT coefficient) before quantization, global_gain is the global gain, and scalefac is the scale factor of the SFB that includes this spectrum component.
Next, the number of use bits for one frame upon Huffman-encoding these quantized spectrum is calculated in step S503, and is compared with the number of bits assigned to the frame in step S504. If the number of use bits is larger than the number of assigned bits, the global gain is incremented by 1 to make the quantization coarser in step S505, and the process returns to the spectrum quantization in step S502. This repetition is made until the number of required bits after quantization becomes smaller than the number of assigned bits, and the global gain is determined at that time, thus ending the code amount control loop.
In step S506, the spectrum quantized by the code amount control loop is dequantized and the difference between the dequantized spectrum and that before quantization is calculated to obtain quantization errors. The quantization errors are combined for each SFB.
It is checked in step S507 if the scale factor >0 in all the SFBs or the quantization errors fall within an allowable error range. If an SFB that does not meet these conditions is found, the process advances to step S508 to increment by 1 the scale factor of the SFB whose quantization errors do not fall within the allowable error range, and the distortion control loop processing is repeated again. Note that allowable errors for each SFB are calculated by auditory arithmetic operations before the quantization processing.
As described above, the quantization processing method described in the ISO written standards is configured by double loops, and the global gain and scale factor undergo only control with a step width of 1. For this reason, the spectrum quantization and bit calculations are repeated endlessly until this processing converges.
In case of, e.g., MPEG-2 AAC, the spectrum quantization makes calculations of equation (1) 1024 times for each processing. Since there are 11 different Huffman code tables to be searched upon bit calculations, if the Huffman code tables are fully searched, the calculation amount of the bit calculations inevitably becomes large.
Furthermore, in the distortion control loop, the quantization errors are calculated after inverse quantization, and this processing also requires high computational complexity. For this reason, a huge computational complexity is required until the double loops converge.
In order to solve this problem, various attempts have been made to reduce the computational complexity by reducing the number of repetition times of the double loops.
For example, Japanese Patent Laid-Open No. 2003-271199 discloses a technique that controls the common scale factor and scale factor not with a step width of 1 but of 2 or more determined by the number of steps according to the characteristics of the Huffman code tables. In this way, the numbers of loop times of the double loops are reduced to reduce the computational complexity.
Japanese Patent Laid-Open No. 2001-184091 discloses a method of executing a normal inner loop after an estimated value of the quantization step is calculated first, and the scale factor is then calculated according to MNR.
Also, A. D. Duenes, R. Perez, B. Rivas, et. al., “A robust and efficient implementation of MPEG-2/4 AAC Natural Audio Coders”, AES 112th Convention Paper (2002) discloses a technique that calculates the scale factor as needed prior to the spectrum quantization using an equation obtained by modifying equation (1) and allowable error energy for each SFB obtained by auditory analysis. In this way, the outer distortion control loop of the double loops is removed to reduce the computational load.
Using these conventional techniques, convergence of the double loops of the quantization processing can be accelerated to reduce the computational complexity of the quantization processing to some extent.