The present invention relates to an encoding method and an encoding apparatus, a decoding method and a decoding apparatus, a transmission apparatus and a transmission method and a transmission apparatus, and a recording medium, and more particularly to an encoding method and an encoding apparatus, a decoding method and an encoding apparatus, a transmission method and a transmission apparatus, and a recording medium, which are suitable when they are used in carrying out efficient encoding of digital data such as acoustic signal or audio (speech) signal, etc. to carry out transmission thereof or record it with respect to recording medium, and receiving or reproducing such signal at the decoding side to decode it.
Hitherto, as a technique for efficiently encoding audio signal such as speech (sound), etc., there are known, e.g., non-blocking frequency band division system represented by band division coding (sub-band coding), etc. and a blocking frequency band division system represented by transformation encoding, etc.
In the deblocking frequency band division system, audio signal on the time axis divided into signal components every plural frequency bands without carrying out blocking to carry out encoding thereof. Moreover, in the blocking frequency band division system, signal on the time axis is transformed (spectrum-transformed) into signal on the frequency axis to carry out division into signal components every plural frequency bands, i.e., to collect coefficients obtained by carrying out spectrum transformation every predetermined bands to carry out encoding every respective bands.
Further, as a technique for further improving encoding efficiency, there is also proposed a high efficient encoding technique in which the deblocking frequency band division system and the blocking frequency-band system as described above are combined. In accordance with this technique, e.g., band division is carried out by the band division encoding thereafter to carry out spectrum transformation of signal every respective bands into signal on the frequency axis so that encoding is carried out every respective bands of signal which has been caused to undergo spectrum transformation.
Here, in carrying out the frequency band division, since processing is simple and aliasing distortion is cancelled, there are many instances where, e.g., QMF (Quadrature Mirror Filter) is used. It is to be noted that the detail of the frequency band division by QMF is described in “1976R. E. Crochiere, Digital coding of speech in subbands, Bell Syst. Tech. J. Vol. 55, No. 8 1976”, etc.
Further, as a technique for carrying out band division, in addition to the above, there is, e.g., POF (Polyphase Quadrature filter) which is filter division technique of equi-band width, etc. The detail of this PQF is described in “ICASSP 83 BOSTON, Polyphase Quadrature filters—A new subband coding technique, Joseph H. Rothweilier”, etc.
On the other hand, as the above-described spectrum transformation, there is, e.g., spectrum transformation of blocking an input audio signal by frame of a predetermined unit time to carry out Discrete Fourier Transformation (DFT), Discrete Cosine Transformation (DCT), or Modified Discrete Cosine Transformation (MDCT), etc. to thereby transform time axis signal into frequency axis signal.
It is to be noted that the detail of MDCT is described in “ICASSP 1987, Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation, J. P. Princen, A. B. Bradley, Univ. of Surrey Royal Melbourne Inst. of Tech.”, etc.
As described above, since signal every band obtained by filter or spectrum transformation is quantized to have ability to control band in which quantization noise is generated, it is possible to carry out higher efficient encoding in an auditory point of view by making used of the masking effect, etc. Moreover, if signal components every respective bands are normalized by, e.g., the maximum value of absolute value of signal component of corresponding band before quantization is carried out, it is possible to carry out higher efficient encoding.
The widths of respective frequency bands when band division is carried out are determined by taking, e.g., auditory characteristic of the human being into consideration. Namely, in general, there are instances where audio signal is divided into signal components every plural (e.g., 32, etc.) bands by band width such that according as frequency band shifts to higher frequency band, the width becomes broader, which is called, e.g., critical band.
Further, in encoding data every respective bands, a predetermined bit allocation every respective bands or adaptive bit allocation every respective bands is carried out. Namely, e.g., in encoding coefficient data obtained after undergone MDCT processing by bit allocation, the numbers of bits are adaptively allocated to MDCT coefficient data every respective bands obtained by MDCT-processing signals every blocks so that encoding is carried out.
As the bit allocation technique, there are known, e.g., a technique of carrying out bit allocation on the basis of magnitudes of signals every respective bands (hereinafter referred to as a first bit allocation technique as occasion demands) and a technique of obtaining necessary signal-to-noise ratios every respective bands by making use of auditory masking to carry out fixed bit allocation (hereinafter refereed to as a second bit allocation technique as occasion demands), etc.
It is to be noted that the detail of the first bit allocation technique is described in, e.g., “Adaptive Transform coding of Speech Signals, R. Zelinski and P. Noll, IEEE Transactions of Acoustics, Speech and Signal Processing, Vol. ASSP-25, No. 4, Aug. 1977”, etc.
Moreover, the detail of the second bit allocation technique is described in, e.g., “ICASSP 1980, The critical band coder digital encoding of the perceptual requirements of the auditory system, M. A. Kransner MIT”, etc.
In accordance with the first bit allocation technique, quantization noise spectrum is flattened so that noise energy becomes minimum. However, since the masking effect is not utilized from an auditory point of view, noise feeling in view of actual auditory point of view does not become optimum. Moreover, in the second bit allocation technique, in the case where energy is concentrated on a certain frequency, even in the case where, e.g., sine wave, etc. is inputted, since bit allocation is fixed, the characteristic value does not become so much good value.
In view of the above, there is proposed a high efficient encoding apparatus in which all bits which can be used for bit allocation are used in the state divided into bits for fixed bit allocation pattern determined in advance every respective small blocks and bits for carrying out bit allocation dependent upon magnitudes of signals of respective blocks to allow its divisional ratio to be dependent upon signal related to input signal, i.e., according as, e.g., spectrum of corresponding signal becomes smooth, divisional ratio with respect to bits for the fixed bit allocation pattern becomes large.
In accordance with this method, in the case where energy is concentrated on a specific spectrum like sine wave input, many bits are allocated to block including that spectrum. Thus, the entire signal-to-noise characteristic can be dramatically improved. In general, since the auditory sense of the human being is extremely sensitive to signal having sharp spectrum component, the fact that the signal-to-noise characteristic is improved in a manner as described above, not only improves numeric value in measurement but also is effective in improvement of sound quality from viewpoint of auditory sense.
As a method for bit allocation, a large number of methods are proposed in addition to the above. If model relating to auditory sense further becomes fine and the ability of the encoding apparatus is improved, encoding which is higher efficient from a viewpoint of auditory sense can be made.
In the case where DFT or DCT is used as a method of transforming waveform signal into spectrum, when transformation is carried out by time block consisting of M samples, M independent actual number data are obtained. However, since one block is ordinarily constituted in the state overlapping with both adjacent blocks respectively by predetermined number (M1) of samples in order to reduce connection distortion between time blocks (frames), M actual number data are quantized with respect to (M−M1) samples in average and encoded in the encoding method utilizing DFI or DCT.
Moreover, in the case where MDCT is used as a method of transforming signal on the time axis into spectrum, independent M data are obtained from 2M samples overlapping with both adjacent blocks by M samples. Accordingly, in this case, M actual number data are quantized and encoded with respect to M samples on the average. In this case, at the decoding apparatus, waveform elements obtained by implementing inverse transformation at respective blocks from codes obtained by using MDCT as described above are added while allowing them to interfere with each other so that waveform signal is reconstructed.
In general, by elongating time block (frame) for transformation, frequency resolution of spectrum is enhanced so that energy is concentrated to a specific spectrum component. Accordingly, in the case where there is used MDCT in which transformation is carried out by long block length in the state overlapping with both adjacent blocks by halves and the number of spectrum signals obtained is not increased with respect to the number of original time samples, it becomes possible to carry out higher efficient encoding as compared to the case where DFT or DCT is used. In addition, adjacent blocks are caused to have sufficiently long overlap, thereby also making it possible to reduce distortion between blocks of waveform signal.
In constituting actual code train, first, every bands where normalization and quantization are carried out, quantization accuracy information which is information indicating quantization step when quantization is carried out and normalization information which is information indicating coefficients used for normalizing respective signal components are encoded by a predetermined number of bits every band where normalization and quantization are carried out, and the normalized and quantized spectrum signal is then encoded.
Here, e.g., in the “IDO/IEC 11172-3: 1993 (E), 1993”, there is described efficient encoding system set so that the number of bits indicating quantization accuracy information are caused to be different in dependency upon band. In accordance with this system, normalization is carried out so that according as the frequency band shifts to higher frequency band, the number of bits indicating quantization accuracy information becomes smaller.
An example of the configuration of a conventional encoding apparatus adapted for carrying out, e.g., frequency band division of audio signal to carry out encoding thereof is shown in FIG. 1. An audio signal to be encoded is inputted to a band division unit 101, at which it is divided into, e.g., signals of four frequency bands.
Here, at the band division unit 101, filter such as the above-described QMF or PQF, etc. may be also used to carry out band division. Moreover, spectrum transformation such as MDCT, etc. may be also carried out to carry out grouping of spectrum signals obtained as the result thereof every bands to thereby carry out band division.
It is to be noted that widths of respective bands when band division of audio signal is carried out at the band division unit 101 (hereinafter referred to as encoding unit as occasion demands) may be uniform, or may not be uniform in a manner caused to be in correspondence with critical frequency band, etc. Moreover, while the audio signal in FIG. 1 is divided into four encoding units, the number of encoding units is not limited to this.
Signals decompressed into four encoding units (four encoding units will be respectively referred to as the first˜fourth encoding units hereinafter) are delivered to a quantization accuracy determination unit 103 every predetermined time block (frame). Further, signals of the first˜fourth encoding units are also respectively delivered to normalization units 1021˜1024.
The normalization units 1021˜1024 extract signal component in which the absolute is maximum from, e.g., respective signal components constituting respective signals of the inputted first˜fourth encoding units to allow coefficient corresponding to this value to be normalization coefficients of the first˜fourth encoding units. Further, at the normalization units 1021˜1024, respective signal components constituting signals of the first˜fourth encoding units are respectively (divided) by values corresponding to normalization coefficients of the first˜fourth encoding units. Accordingly, in this case, normalized data obtained by normalization become value within the range of −1.0˜1.0.
The normalized data are respectively outputted from the normalization units 1021˜1024 to quantization units 1041˜1044. Moreover, normalization coefficients of the first˜fourth encoding units are respectively from the normalization units 1021˜1024 to a multiplexer 105.
To the quantization units 1041˜1044, normalized data of the first˜fourth units are delivered from the respective normalization units 1021˜1024, and quantization accuracy information for indicating quantization step when normalized data of the first˜fourth encoding units are quantized are also delivered from the quantization accuracy determination unit 103.
Namely, the quantization accuracy determination unit 103 determines, on the basis of signals of the first˜fourth encoding units from the band division unit 101, quantization step in quantizing respective normalized data of the first˜fourth encoding units from the band division unit 101, quantization step in quantizing respective normalized data of the first˜fourth encoding units. Further, quantization accuracy information of the first˜fourth encoding units corresponding to that quantization step are respectively outputted to the quantization units 1041˜1044, and are also outputted to the multiplexer 105.
At the quantization units 1041˜1044, normalized data of the first˜fourth encoding units are respectively quantized by quantization steps corresponding to quantization accuracy information of the first˜fourth encoding units so that they are encoded. Quantization coefficients of the first˜fourth encoding units obtained as the result thereof are outputted to the multiplexer 105 At the multiplexer 105, quantization coefficients, quantization accuracy information and normalization coefficients of the first˜fourth encoding units are encoded as occasion demands, and are then multiplexed. Further, encoded data obtained as the result thereof is caused to undergo transmission through transmission path, or is recorded with respect to a recording medium 106.
It is to be noted that, at the quantization accuracy determination unit 103, determination of quantization step is not only carried out on the basis of signal obtained after undergone band division, but also may be carried out, e.g., on the basis of normalization data or may be carried out by taking auditory sense phenomenon, such as masking effect, etc. into consideration.
An example of the configuration of a decoding apparatus adapted for decoding encoded data outputted from the encoding apparatus having such a configuration is shown in FIG. 2. In FIG. 2, encoded data is inputted to a demultiplexer 121, at which it is decoded. The decoded data thus obtained is separated into quantization coefficients, quantization accuracy information and normalization coefficients of the first˜fourth encoding units. The quantization. coefficients, the quantization accuracy information and the normalization information of the first˜fourth encoding units are delivered to signal component constituting units 1221˜1224 corresponding to respective encoding units.
At the signal component constituting unit 1221, quantization coefficient of the first encoding unit is inverse-quantized by quantization step corresponding to quantization accuracy information of the first encoding unit. Thus, such quantization coefficient is caused to be normalized data of the first encoding unit. Further, at the signal component constituting unit 1221, normalized data of the first encoding unit is multiplexed by value corresponding to normalization coefficient of the first encoding unit. Thus, signal of the first encoding unit is decoded, and is outputted to a band synthesis unit 123.
Also, at the signal component constituting units 1222˜1224, similar processing are carried out. Thus, signals of the second˜fourth encoding units are decoded, and are outputted to the band synthesis unit 123. At the band synthesis unit 123, signals of the first˜fourth encoding units are band-synthesized. Thus, original audio signal is restored (reconstructed).
Meanwhile, since quantization accuracy information is included in encoded data delivered (transmitted) from the encoding apparatus of FIG. 1 to the decoding apparatus of FIG. 2, auditory model used in the decoding apparatus can be arbitrarily set. Namely, at the encoding apparatus, it is possible to freely set quantization steps with respect to respective encoding units, and it is possible to realize improvement in sound quality and/or improvement in compression ratio without changing the decoding apparatus with improvement in arithmetic (computing) ability and/or realization of fineness of the auditory sense model.
However, in this case, the number of bits for encoding quantization accuracy information itself becomes large. As a result, it was difficult to improve the entire encoding efficiency so that it becomes equal to a certain value or more.
In view of the above, in place of directly encoding quantization accuracy information, there is, e.g., a method of determining quantization accuracy information from normalization coefficients at the decoding apparatus. However, in this method, since the relationship between normalization coefficients and quantization accuracy information is determined at the time point when the standard has been determined, there is the problem that it becomes difficult to introduce control of quantization accuracy based on higher level auditory sense model in future. Moreover, in the case where there is width in compression ratio to be realized, there takes place the necessity of determining the relationship between normalization coefficients and quantization accuracy information every compression ratio.
Accordingly, in order to further improve the compression ratio, not only encoding efficiency of main information directly subject to encoding is enhanced, but also it becomes necessary to enhance encoding efficiency of sub-information which is not directly subject to encoding such as quantization accuracy information or normalization coefficient, etc.
Since such quantization accuracy information and/or normalization coefficients, etc. have, in many cases, correlation between adjacent normalization units, between adjacent channels, or between adjacent times, there are many cases where difference value between information of high correlation is determined to encode that difference value by using variable length code book (table). In this technique, encoding efficiency can be enhanced as compared to the case where information is encoded as it is without using difference, but there was the problem that size of code book (table) becomes large on the contrary.
Consideration is made in connection with the case where, e.g., distribution range of the quantization accuracy information is 0˜7, and encoding is carried out by 3 bits. In the case where the quantization accuracy information is encoded as it is, code book (table) size is 8. To the contrary, in the case where difference value is encoded, that difference value is broadened to about double range of −7˜7, and size of code book (table) becomes about double value from 8 to 15 as shown in FIG. 3. Further, in the case where difference value of difference value is encoded, the distribution range is broadened to about four times of −14˜14, and size of code book (table) becomes about four times value from 8 to 29 as shown in FIG. 4.
In addition, although variable length code is prepared in accordance with the probability distribution, long code is allocated to value of low appearance probability. As a result, there was the problem that the number of bits to be encoded is increased to much extent as compared to the case where variable length code book (table) is not used.