(1) Field of the Invention
The present invention relates to an audio coding and quantization method which is appropriate for various applications including the fields of audio signal storage, communication and broadcasting applications.
(2) Description of the Related Art
Digital representations of analog waveforms introduce some kind of distortions. A basic problem in the design of source coders is to achieve a given acceptable level of distortion with the smallest possible encoding bit rate. To reach this goal the encoding algorithm must be adapted both to the changing statistics of the source signal and to auditory perception. Auditory perception is based on critical band analyses in the human ear. The power spectra are not represented on a linear frequency scale but on the frequency bands, called critical bands, with bandwidths on the order of 100 Hz below 500 Hz and with increasing bandwidths (up to 500 Hz) at high signal frequencies. Within critical bands the intensities of individual tones are summed by the ear. Up to 20,000-Hz bandwidth 26 critical bands have to be taken into account. Audio coders that exploit auditory perception must be based on critical-band structured signal processing.
Auditory masking describes the effect that a low-level audio signal (called the maskee) can become inaudible when a louder signal (called the masker) occurs simultaneously. The effect of simultaneous masking and temporal masking can be exploited in audio coding by transmitting only those details of the signal which are perceptible by ear. Such coders provide high coding quality without providing high signal-to-noise ratios.
Hereinafter, the lower limit of a sound pressure level from which any signal will not be audible due to the masker is called a masking threshold. It is also known as a threshold of just noticeable distortion in the context of source coding.
Generally, audio signals in the vicinity of 4 kHz are very perceptible by the human ear regardless of whether the masker is present. Hereinafter, the lower limit of a sound pressure level that is audible to the human ear is called an absolute hearing threshold. It is also known as a threshold in quiet.
FIG. 6 shows a relationship between the absolute hearing threshold and the masking threshold in a spectral distribution of audio signal.
Without a masker, an audio signal (A) (indicated by the solid line in FIG. 6) is inaudible if its sound pressure level is below the absolute hearing threshold (C) (indicated by the two-dot chain line in FIG. 6) which depends on frequency. The sound pressure level that is equal to 0 dB relates to a sound pressure of 0.02 mN/m2. In the presence of a masker, the masking threshold (B) (indicated by the dotted line in FIG. 6) can be measured below which any signal will not be audible. The masking threshold depends on the sound pressure level, the frequency of the masker, and on the characteristics of masker and maskee.
In addition to simultaneous masking of one sound by another one occurring at the same time, temporal masking occurs when two sounds appear within a small interval of time; the stronger one masks the weaker one, regardless of whether the latter one occurs before or after it. Temporal masking can be used to mask pre-echoes caused by the spreading of a sudden large quantization error over the actual coding block.
The effect of simultaneous masking and temporal masking can be exploited in audio coding by transmitting only those details of the signal which are perceptible by ear. It is equivalent to a bit allocation by which the necessary bits for encoding the bitstream are allocated to only the portions of the audio signal (A) which are above the masking threshold (B) and the absolute hearing threshold (C). In the audio coding, the audio signal is divided into a number of spectral subband components (D) (indicated by the one-dot chain lines in FIG. 6) and each component is quantized whereby the number of quantizer levels for each component is obtained from the bit allocation.
The width of each subband component (D) is equivalent to the bandwidth of the audio signal. In each subband the signal component the intensity of which is below a certain lower limit will not be audible. As long as the difference in intensity between the source signal and the decoded signal is below the lower limit, the decoded signal will be indistinguishable from the source signal. Hereinafter, the lower limit of a sound pressure level for each subband is called an allowed distortion level. In the context of audio coding, if the level of a quantization error produced by the quantization of an audio signal is below the allowed distortion level, the audio coding can provide high coding quality without providing high signal-to-noise ratios. The bit allocation for each subband component (D), as shown in FIG. 6, is equivalent to controlling the quantization of the audio signal such that the quantization error level for each subband is exactly equal to the allowed distortion level.
As disclosed in Japanese Laid-Open Patent Application No. 7-154266, an audio coding and quantization algorithm for digital audio signals is known. In the audio coding method of the above publication, a digital audio signal is converted into blocks of spectral data, and each block is divided into units of normalized coefficients. An upper limit of the number of bits allocated per block is fixed. The bit allocation is controlled by using the fixed upper limit. For the blocks with the number of needed bits that exceeds the upper limit of the number of allocated bits, the normalized coefficients of the related unit are forcefully corrected so that the numbers of needed bits for all the blocks are below the upper limit.
International Standard ISO/IEC 13818-7 provides a generic audio coding and quantization algorithm for digital audio signals. In the audio coding and quantization method of this standard, it is difficult to speedily carry out an iterative process that converges when the total bit count is within some interval surrounding the allocated bit count, while preventing the degradation of coding quality due to nonconvergence. If both a bit rate requirement and a masking requirement are not finally met, it is likely to cause the degradation of coding quality. Further, in the above-described method of International Standard ISO/IEC 13818-7, when the check of the masking requirement is done, the quantization error levels of all the subbands are not always less than the allowed distortion levels. Even if both the bit rate requirement and the masking requirement are finally met, it requires a relatively large computing time until the convergence is reached. As long as the masking requirement is not met, the bit allocation control must be repeated many times. The repeated bit allocation control includes some redundant processes.
In the conventional method of the above publication (Japanese Laid-Open Patent Application No. 7-154266), the same problem remains unresolved. It is difficult to speedily carry out the iterative process that converges when the total bit count is within some interval surrounding the allocated bit count, while preventing the degradation of coding quality due to nonconvergence.
An object of the present invention is to provide an improved audio coding and quantization method in which the above-described problems are eliminated.
Another object of the present invention is to provide an audio coding and quantization method which is effective in speedily carrying out an iterative process that converges when the total bit count is within some interval surrounding the allocated bit count, while preventing the degradation of coding quality due to nonconvergence.
Still another object of the present invention is to provide an audio coding and quantization method which is effective in providing high coding quality without providing high signal-to-noise ratios.
The above-mentioned objects of the present invention are achieved by an audio coding and quantization method which includes the steps of: converting each of blocks of an input audio signal into a number of spectral subband components, the blocks being produced from the signal along a time axis; converting a related one of the blocks into an input vector of frequency domain values; quantizing each subband component whereby the number of quantizer levels for a related one of spectral subbands is obtained from a bit allocation; controlling the bit allocation for each subband by using a psychoacoustic model which generates an allowed distortion level of a related one of scalefactor bands corresponding to the subbands; calculating, during the controlling step, a quantization of the frequency domain values of the related block through a first control loop, the first control loop being repeated until a bit rate requirement is met that a count of bits needed to encode a bitstream is less than a predetermined count of bits available to encode the bitstream; and calculating, through a second control loop, a quantization noise for each subband, produced by the quantization of the frequency domain values within the first control loop, the second control loop being repeated until a masking requirement is met that a quantization error level of the frequency domain values with scalefactors applied to the values within the scalefactor bands is less than the allowed distortion level, wherein the first control loop and the second control loop are alternately performed for the related block such that both the bit rate requirement and the masking requirement are met, and, after both the requirements are met, an output vector of quantized frequency domain values is finally produced.
According to the audio coding and quantization method of the present invention, when controlling the bit allocation for each subband, the first control loop and the second control loop are alternately performed for each block such that both the bit rate requirement and the masking requirement are met. After both the requirements are met, an output vector of quantized frequency domain values is finally produced. A total number of executions of the loop processes needed to optimize the bit allocation is remarkably reduced, and it is possible to speedily reach the convergence. Therefore, the audio coding quantization method of the present invention is effective in providing high coding quality without providing high signal-to-noise ratios. It is possible to speedily carry out the iterative process that converges when the total bit count is within some interval surrounding the allocated bit count, while preventing the degradation of coding quality due to nonconvergence.