The present invention relates to an encoding method and apparatus, a decoding method and apparatus, a program, and a recording medium, in particular, to an encoding method and apparatus for encoding digital data of acoustic signals or sound signals with high efficiency to transmit thus encoded data or record thus encoded data to a recording medium, to a decoding method and apparatus for receiving or reproducing encoded data to decode thus received or reproduced encoded data, to a program for making a computer carry out the encoding processing and the decoding processing, and to a recording medium having recorded therein the program which can be read out by a computer.
This application claims priority of Japanese Patent Application No. 2002-132188, filed on May 7, 2002, the entirety of which is incorporated by reference herein.
Conventionally, as methods for encoding audio signals of sound signals, etc. with high efficiency, there are known non-blocking frequency band division systems, such as the band division encoding (subband coding), and blocking frequency band division systems, such as the conversion encoding.
In the non-blocking frequency band division systems, an audio signal on time base are divided into a plurality of frequency bands without blocking the signal, and thus divided signal is encoded. On the other hand, in the blocking frequency band division systems, a signal on time base is converted to a signal on frequency base (spectrum conversion), and thus converted signal is divided into a plurality of frequency bands. Then, coefficients obtained through the spectrum conversion are put together according to predetermined respective frequency bands, and thus divided signal is encoded in respective bands.
Furthermore, as a method to improve efficiency of encoding, there is suggested a high-efficient encoding method which jointly introduces the non-blocking frequency band division system and the blocking frequency band division system. Employing this method, after performing band division employing band division encoding, a signal divided into respective bands is converted to a signal on frequency base through spectrum conversion, and thus converted signal is encoded in the respective bands.
In performing frequency band division, the QMF (Quadrature Mirror Filter) may be used in many cases since signals can be processed simply and aliasing distortions can be removed. Details of frequency band division by the QMF are written in “1976 R. E. Crochiere, Digital coding of speech in subbands, Bell Syst. Tech. J. Vol. 55, No. 8 1976”.
Furthermore, as a method to perform band division, there is known the PQF (Polyphase Quadrature Filter) which is a filter division method with equalized bandwidths. Details of the PQF are written in “ICASSP 83 BOSTON, Polyphase Quadrature Filters—A new subband coding technique, Joseph H. Rothweiler”.
On the other hand, as above-described spectrum conversion, for example, an input audio signal is blocked using a frame of predetermined unit time, and the signal on time base is converted to a signal on frequency base by undergoing the DFT (Discrete Fourier Transformation), DCT (Discrete Cosine Transformation), MDCT (Modified Discrete Cosine Transformation) in respective blocks.
Details of the MDCT are written in “ICASSP 1987, Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation, J. P. Prince, A. B. Bradley, Univ. of Surrey Royal Melbourne Inst. of Tech.”
By quantizing a signal divided into respective bands which is obtained through the filter and spectrum conversion, bands which raise quantization noise can be controlled, which enables high-efficient encoding in auditory sense by utilizing property of masking effect, etc. Furthermore, prior to quantization, signal components of respective bands are normalized by the maximum of absolute values of signal components of each band, which enables more high-efficient encoding.
Bandwidths of respective frequency bands in performing band division are determined in view of human auditory property. That is, in general, an audio signal may be divided into a plurality of bands (for example, 32 bands) under critical bands in which higher bands are of broader bandwidth.
In encoding data in respective bands, bit allocation is performed to allocate predetermined bits or adaptable bits to respective bands. That is, in encoding coefficient data, obtained through the MDCT processing, by employing bit allocation, the numbers of bits are adaptably allocated to MDCT coefficient data of respective bands that are obtained by performing the MDCT processing for a signal blocked into respective blocks.
As bit allocation methods, there are known a method of performing bit allocation based on signal amount of respective bands (properly referred to as a first bit allocation method, hereinafter), and a method of performing bit allocation fixedly, in which signal-to-noise ratios necessary for respective bands are obtained by utilizing auditory masking (properly referred to as a second bit allocation method, hereinafter).
Details of the first bit allocation method are written in “Adaptive Transform Coding of Speech Signals, R. Zelinski and P. Noll, IEEE Transactions of Accoustics, Speech and Signal Processing, vol. ASSP-25, No. 4, August 1977”.
Details of the second bit allocation method are written in “ICASSP 1980, The critical band coder digital encoding of the perceptual requirements of the auditory system, M. A. Kransner MIT”.
Employing the first bit allocation method, quantization noise spectrums are planarized, minimizing noise energy. However, since masking effect is not utilized in auditory sense, actual auditory noise level is not optimized. On the other hand, employing the second bit allocation method, in case energy is concentrated on a specific frequency, for example, even though a sinusoidal wave is input, since bit allocation is performed fixedly, desirable property value cannot be obtained.
So, there is suggested a high-efficient encoding apparatus which divides entire bits, which are to be used in bit allocation, into bits for fixed bit allocation patterns which are determined in advance for respective small blocks and bits for bit allocation which depend on signal amount of respective blocks, and causes the division ration to depend on a signal related with an input signal. That is, for example, when spectrums of a signal are smooth, division proportion for the fixed bit allocation patterns is enhanced.
Employing this method, in case energy is concentrated on a specific spectrum when inputting a sinusoidal wave, many bits are allocated to a block including the spectrum, which can improve the whole signal-to-noise ratio significantly. In general, since human auditory is extremely sensitive to a signal having a steep spectrum component, above-described improvement of signal-to-noise ratio not only improves measurement numerical value but also improves quality of sound in auditory sense effectively.
As methods of bit allocation, there are suggested many other methods other than above-described methods, and models concerning auditory are becoming refined. Improvement in operational capability of an encoding apparatus enables high-efficient encoding from an auditory point of view.
In case of employing the DFT or the DCT as a method to convert a waveform signal to spectrums, when converting the signal using time blocks composed of M sets of samples, M sets of independent real number data can be obtained. Generally, in order to reduce connection distortions between time blocks (frames), each block is overlapped with both neighbouring blocks by predetermined M1 sets of samples respectively. Thus, when employing an encoding method utilizing the DFT or the DCT, M sets of real number data are quantized to be encoded for (M−M1) sets of samples on the average.
In case of employing the MDCT as a method to convert a signal on time base to spectrums, M sets of independent real number data can be obtained from 2M sets of samples with each block overlapped with both neighbouring blocks by M sets of samples respectively. Thus, in this case, M sets of real number data are quantized to be encoded for M sets of samples on the average. Then, a decoding apparatus regenerate a waveform signal from codes obtained in above-described method that utilizes the MDCT by adding waveform components obtained from respective blocks through inverse conversion with the respective waveform components interfering with each other.
In general, by making time blocks (frames) for conversion longer, frequency resolution of spectrums is enhanced and energy is concentrated on a specific spectrum component. In case of using the MDCT, in which a signal is converted using long blocks with each block overlapped with both neighbouring blocks by half and the number of obtained spectrums does not increase from the number of original time samples, it becomes possible to realize high-efficient encoding as compared with the case using the DFT or the DCT. Furthermore, by making adjacent blocks have properly long overlaps, distortions between blocks of a waveform signal can be reduced.
In generating an actual code sequence, firstly, quantization accuracy information indicative of a quantization step used to perform quantization and normalization coefficient information indicative of a coefficient used to normalize respective signal components are encoded with predetermined number of bits for respective bands in which normalization and quantization are to be performed. Then normalized and quantized spectrums are encoded.
There is written a high-efficient encoding method in “IDO/IEC 11172-3:1993(E), 1993”, in which the numbers of bits indicative of quantization accuracy information are set to be different from band to band. According to the method, it is prescribed that higher bands are small in the number of bits indicative of quantization accuracy information.
FIG. 1 shows a block diagram of a conventional encoding apparatus 100 for encoding audio signals, etc. through frequency band division. A band division unit 101 receives an audio signal to be encoded, and divides thus received audio signal into, for example, four frequency-bands using filters of the QMF, PQF, etc. When dividing an audio signal into bands using the band division unit 101, widths of respective bands (properly referred to as encoding units, hereinafter) may be equal with each other or may not be equal according to critical bands. In this example, an audio signal is divided into four encoding units, while the number of the encoding units is not restricted to this number. Then, the band division unit 101 sends the audio signal, which is divided into four encoding units (properly referred to as first to fourth encoding units, hereinafter), to gain control units 1021 to 1024 corresponding to respective predetermined time blocks (frames).
The gain control units 1021 to 1024 generate gain control information according to amplitudes of respective signals in respective blocks, and control gains of the signals in the respective blocks based on the gain control information. Then, the gain control units 1021 to 1024 send signals of the first to fourth encoding units obtained through the gain control to spectrum conversion units 1031 to 1034, while sending the gain control information to a multiplexer 107.
The spectrum conversion units 1031 to 1034 perform spectrum conversion such as the MDCT for the gain-controlled signals on time base of the respective encoding units to generate signals on frequency base, and send thus generated signals on frequency base to normalization units 1041 to 1044 respectively as well as to a quantization accuracy decision unit 105.
The normalization units 1041 to 1044 extract signal components of maximum absolute value from respective signal components constituting the respective signals of the first to fourth encoding units, and set coefficients corresponding to thus extracted signal components to be normalization coefficients of the first to fourth encoding units. Then, the normalization units 1041 to 1044 normalize or divide the respective signal components constituting the respective signals of the first to fourth encoding units using values corresponding to the normalization coefficients of the first to fourth encoding units. Thus, in this case, normalized data obtained through the normalization ranges from −1.0 to 1.0. The normalization units 1041 to 1044 send normalized data of the first to fourth encoding units to quantization units 1061 to 1064 respectively, while sending the normalization coefficients of the first to fourth encoding units to the multiplexer 107.
The quantization accuracy decision unit 105 decides quantization steps to be used in quantizing the normalized data of the first to fourth encoding units based on the signals of the first to fourth encoding units sent from the gain control units 1021 to 1024. Then, the quantization accuracy decision unit 105 sends quantization accuracy information of the first to fourth encoding units corresponding to the quantization steps to the quantization units 1061 to 1064 as well as to the multiplexer 107.
The quantization units 1061 to 1064 encode the normalized data of the first to fourth encoding units by quantizing the data using the quantization steps corresponding to the quantization accuracy information of the first to fourth encoding units, and send thus obtained quantization coefficients of the first to fourth encoding units to the multiplexer 107.
The multiplexer 107 encodes the quantization coefficients, quantization accuracy information, normalization coefficients, and gain control information of the first to fourth encoding units, if necessary, to multiplex those data. Then, the multiplexer 107 transmits encoded data obtained through multiplex processing via a transmission line, or records the encoded data to a recording medium, not shown.
Instead of deciding quantization steps based on the signals obtained through band division, the quantization accuracy decision unit 105 can decide quantization steps based on normalization data, or can decide quantization steps in view of auditory phenomenon such as masking effect.
FIG. 2 shows a block diagram of a conventional decoding apparatus 120 for decoding encoded data output from the encoding apparatus 100. In the decoding apparatus 120 shown in FIG. 2, a demultiplexer 121 decodes and demultiplexes input encoded data into the quantization coefficients, quantization accuracy information, normalization coefficients, and gain control information of the first to fourth encoding units. Then, the demultiplexer 121 sends the quantization coefficients, quantization accuracy information, and normalization coefficients of the first to fourth encoding units to signal component construction units 1221 to 1224 corresponding to the respective encoding units, while sending the gain control information of the first to fourth encoding units to gain control units 1241 to 1244 corresponding to the respective encoding units.
The signal component construction unit 1221 dequantizes the quantization coefficient of the first encoding unit using the quantization step corresponding to the quantization accuracy information of the first encoding unit to generate normalized data of the first encoding unit. Furthermore, the signal component construction unit 1221 decodes the normalized data of the first encoding unit by multiplying the data by a value corresponding to the normalization coefficient of the first encoding unit, and sends thus obtained signal of the first encoding unit to a spectrum inverse-conversion unit 1231.
The signal component construction units 1222 to 1224 perform similar decode processing to generate signals of the second to fourth encoding units, and send thus obtained signals of the second to fourth encoding units to spectrum inverse-conversion units 1232 to 1234 respectively.
The spectrum inverse-conversion units 1231 to 1234 perform spectrum inverse-conversion such as the IMDCT for the decoded signals on frequency base to generate signals on time base, and send thus generated signals on time base to gain control units 1241 to 1244.
The gain control units 1241 to 1244 perform gain control compensation processing based on gain control information sent from the demultiplexer 121, and send thus obtained signals of the first to fourth encoding units to a band composition unit 125.
The band composition unit 125 performs band composition to composite the signals of the first to fourth encoding units sent from the gain control units 1241 to 1244 to restore the original audio signal.
Since encoded data supplied or transmitted from the encoding apparatus 100 shown in FIG. 1 to the decoding apparatus 120 shown in FIG. 2 includes quantization accuracy information, auditory models used in the decoding apparatus 120 can be arbitrarily set up. That is, quantization steps for the respective encoding units can be freely set up in the encoding apparatus 100, which can improve sound quality and can enhance compression ratio without replacing or upgrading the decoding apparatus 120 along with improvement of operation capability of the encoding apparatus 100 and refinement of auditory models.
On the other hand, in this case, the number of bits to encode quantization accuracy information itself is caused to be undesirably large, which makes it difficult to improve the whole encoding efficiency from a level.
There is a method in which processing, for example, a decoding apparatus decides quantization accuracy information from normalization information instead of directly encoding quantization accuracy information. However, employing this method, the relation between normalization coefficients and quantization accuracy information is determined at the time the standard is decided, which makes it difficult to introduce control of quantization accuracy based on advanced auditory models in the future. Also, in case actual compression ratio has some width, the relation between normalization coefficients and quantization accuracy information has to be determined for respective values of compression ratio.
Thus, in order to improve compression ratio from a level, it is necessary to improve not only encoding efficiency of main information or direct subject for encoding such as audio signals shown in FIG. 1 but also encoding efficiency of secondary information which is not direct subject of encoding such as quantization accuracy information and normalization coefficients.
The inventor of the present invention suggested a method to improve encoding efficiency of secondary information in a specification and drawings of Japanese Patent Application No. 2000-390598 and Japanese Patent Application No. 2001-182383. Furthermore, the inventor of the present invention suggested a method to improve encoding efficiency of gain information in an encoding system that controls gains in a specification and drawings of Japanese Patent Application No. 2001-182093. According to those techniques, encoding efficiency of secondary information can be improved by employing variable codeword length coding utilizing various correlations, etc.
However, in case significantly high compression ratio is required, with the number of bits given to an encoding apparatus, quantization accuracy capable of preventing quantization noise from being perceived may not be maintained. In this case, the encoding apparatus often reduces bits allocated to main information. Specifically, normalized data (spectrum) is replaced with “0” or a small value, or band width to perform quantization is narrowed.
As a result, there is raised a problem that decoded and restored sound includes abnormal sound and noise due to temporal band variation, and lack of power due to replacement of spectrum with “0” or a small value. Especially, when compression ratio is significantly enhanced, those phenomenon are undesirably perceived noticeably, leading to an auditory problem.