This invention relates to an encoding apparatus, an encoding method, a decoding apparatus, a decoding method, an encoding program and a decoding program. More particularly, it relates to an encoding apparatus, an encoding method, a decoding apparatus, a decoding method, an encoding program and a decoding program, in which digital data, such as digital audio signals, are encoded with high efficiency encoding and transmitted or recorded on a recording medium, and in which the digital data are received or reproduced for decoding on the side decoder.
A variety of techniques exist for high efficiency encoding of digital audio signals or speech signals. Examples of these techniques include a non-blocking frequency spectrum splitting system, exemplified by spectrum-splitting encoding (sub-band coding) and a blocking frequency spectrum splitting system, exemplified by transform coding.
In the non-blocking frequency spectrum splitting system, audio signals on the time axis are split into plural frequency bands and encoded without blocking. In the blocking frequency spectrum splitting system, the signals on the time axis are transformed into signals on the frequency axis, by orthogonal transform, and the frequency domain signals are split into plural frequency bands, that is, the coefficients obtained on orthogonal transform are grouped from one preset frequency band to another, and the encoding is carried out from one such frequency band to another.
For improving the encoding efficiency further, there is also proposed a technique of high efficiency encoding consisting in the combination of the aforementioned non-blocking frequency spectrum splitting system and the blocking frequency spectrum splitting system. With this technique, the time domain signals are split into plural frequency bands by means of the sub-band coding, and the signals of the respective bands are orthogonal-transformed into those on the frequency axis, and the frequency domain signals, resulting from the orthogonal transform, are encoded from one frequency band to another.
In splitting the time-domain signals into plural frequency bands, a quadrature mirror filter (QMF), for example, is preferentially used, because it assures facilitated processing and cancels out the aliasing distortion. Details of the frequency spectrum splitting by this QMF may be found in R. E. Crochiere, Digital coding of speech in subbands, Bell Syst. Tech. J. Vol. 55, No. 8, 1976.
As a technique for splitting the frequency spectrum, there is also a polyphase quadrature filter (PQF) which represents a technique of dividing the frequency spectrum into equal-width frequency ranges. Details of this PQF are discussed in Joseph H. Rothweiler, Polyphase Quadrature Filters—A new subband coding technique, ICASSP 83 BOSTON.
There are known techniques for orthogonal transform including the technique of dividing the digital input audio signals into blocks of a predetermined time duration, by way of blocking, and processing the resulting blocks using a Discrete Fourier Transform (DFT), discrete cosine transform (DCT) or modified DCT (MDCT) to convert the signals from the time axis to the frequency axis.
Discussions of a MDCT may be found in J. P. Princen and A. B. Bradley, Subband/Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation, ICASSP, 1987, Univ. of Surrey Royal Melbourne Inst. of Tech.
By quantizing the signals, divided from band to band, using a filter or orthogonal transform, it is possible to control the band susceptible to quantization noise and, by exploiting such properties as masking effect, it is possible to achieve psychoacoustically more efficient encoding. If, prior to quantization, the signal components of the respective bands are normalized using the maximum absolute value of the signal components of each band, the encoding efficiency may be improved further.
In quantizing the frequency components, resulting from the division of the frequency spectrum, it is known to divide the frequency spectrum into widths which take characteristics of the human acoustic system into account. That is, audio signals are divided into plural bands, such as 32 bands, in accordance with band widths increasing with increasing frequency.
In encoding the band-based data, bits are allocated fixedly or adaptively from band to band. When applying adaptive bit allocation to coefficient data resulting from MDCT, the MDCT coefficient data of res=bands, obtained on MDCT processing, applied to block-based signals, are encoded with an adaptively allocated number of bits.
As bit allocation techniques, there are currently known a technique of allocating the bits based on the band-based signal magnitude from one band to another, sometimes referred to below as a first bit allocation technique, and a technique of allocating th bits in a fixed manner, based on the required band-based signal-to-noise ratio, obtained by taking advantage of the psychoacoustic masking effect, sometimes referred to below as a second bit allocation technique.
Details of the first bit allocation technique may be found in R. Zelinsky and P. Noll, Adaptive Transform Coding of Speech Signals, IEEE Transactions of Acoustics, Speech and Signal Processing, vol. ASSP-25, No. 4, August 1977.
Details of the second bit allocation technique may be found in M. A. Kransner MIT, The critical band coder digital encoding of the perceptual requirements of the auditory system, ICASSP 1980.
With the first bit allocation technique, the quantization noise spectrum becomes flatter, with the noise energy being minimized. However, the noise level perceived by the listener is not optimum because the psychoacoustic masking effect is not exploited. On the other hand, if, with the second bit allocation technique, the energy is concentrated in a certain frequency, as when a sine wave is input, characteristic values are not optimum because of the fixed bit allocation.
With this in mind, there has been proposed a high efficiency encoding apparatus in which the total number of bits usable for bit allocation is divided into a predetermined number of bits allocated from one sub-block to another and a variable number of bits which depends on the magnitude of the signals of the respective blocks, with the division ratio depending on a signal relevant to the input signal, in such a manner that the proportion of the fixed bit allocation becomes higher the smoother the spectrum of the input signal.
With this method, in case the signal energy is concentrated in a specified spectral component, as when the input signal is a sine wave, a large number of bits are allocated to a block of the spectral component, thereby appreciably improving overall signal-to-noise characteristics. The human auditory system is more sensitive to the signals having steep spectral components, so that, if the signal-to-noise characteristics are improved as described above, not only measured values but also the sound as perceived by the listener may be effectively improved.
There are also other methods proposed in connection with the bit allocation. If more elaborate models simulating the human auditory system are developed and the ability of the encoding apparatus is improved, it would be possible to achieve the encoding with psychoacoustically higher efficiency.
If, with the use of DFT or DCT as a method for transforming waveform signals into spectral signals, the transform is carried out using a time block consisting of M samples, M independent real-number data are obtained. However, since a given time block is overlapped with an overlap of a preset number M1 of samples with both neighboring blocks, with a view to reducing the junction distortion between neighboring time blocks or frames, the encoding method exploiting DFT or DCT quantizes and encodes M real-number data for (M–M1) samples on an average.
If MDCT is used as a method for transforming time-domain signals into spectral signals, M independent real-number data are obtained from 2M samples resulting from overlap with M samples from both neighboring blocks. Thus, in the present case, M real-number data are quantized and encoded for M samples on an average. In this case, the decoding apparatus re-constructs the waveform signals by summing waveform elements, obtained on inverse transform in the respective blocks of the codes obtained on MDCT as described above, as the waveform elements are caused to interfere with one another.
In general, if the time block (frame) for transform is lengthened, the frequency resolution of the spectrum is improved, such that the energy is concentrated in a specified spectral component. Thus, with the use of MDCT in which a block length used for transform is elongated by overlap with one half each of the neighboring blocks, with the number of the resulting spectral components not increasing as compared to the number of the original time samples, the encoding efficiency is higher than in case of using DFT or DCT. Moreover, the block-to-block distortion of the waveform signals may be reduced by providing a sufficiently long overlap between the neighboring blocks.
In constructing an actual codeword, the quantization step information, as the information representing quantization steps used in quantization, and the normalization information, as the information representing the coefficient used in normalizing the respective signal components, are encoded with preset numbers of bits, from one frequency band for normalization or quantization to another, and subsequently the normalized and quantized spectral signals are encoded.
It is noted that the IDO/IEC 11172-3: 1993(E), 1993 states the high efficiency encoding system in which different numbers of bits representing the quantization step information are used from one frequency band to another. Specifically, the number of bits representing the quantization step information is smaller for higher frequency bands.
FIG. 1 shows an example of a structure of a conventional encoding apparatus 100 for splitting the audio signals into plural frequency bands and encoding the resulting band-based signals. The audio signals for encoding are input to a spectrum splitting unit 101 so as to be split into for example signals of four frequency bands.
It is noted that frequency spectrum splitting in the spectrum splitting unit 101 may be by a filter, such as the aforementioned QMF or PQF, or by orthogonal transform, such as MDCT, with the resulting spectral signals being grouped from band to band by way of the frequency spectrum splitting.
Meanwhile, the widths of the bands, termed herein the encoding units, into which the spectrum of the audio signals is split in the spectrum splitting unit 101, may be uniform, or non-uniform in keeping with the critical bands. Although the audio signals are split into four encoding units, in FIG. 1, the number of the encoding units is not limited thereto.
The signals split into the four encoding units, referred to below as first to fourth encoding blocks, are routed to a quantization step determining unit 103 each preset time block or frame. The signals of the first to fourth encoding blocks are also routed to normalization units 1021 to 1024.
The normalization units 1021 to 1024 extract the signal components of the maximum absolute value from the respective signal components forming the signals of the input first to fourth encoding blocks, and set the coefficients corresponding to the extracted signal components as normalization coefficients of the first to fourth encoding blocks. The respective signal components, making up the signals of the first to fourth encoding blocks, are normalized, that is divided, by the normalization units 1021 to 1024, with values corresponding to the normalization coefficients of the first to fourth encoding blocks, respectively. Thus, in the present case, the normalized data resulting from normalization ranges from −1.0 to 1.0.
The normalized data are output from the normalization units 1021 to 1024 to quantizing units 1041 to 1044, respectively. The normalization coefficients of the first to fourth encoding blocks are also output from the normalization units 1021 to 1024 to a multiplexer 105.
The quantizing units 1041 to 1044 are supplied not only with normalized data of the first to fourth encoding blocks, from the normalization units 1021 to 1024, but also with the quantization step information, which specifies the quantization step in quantizing the normalized data of the first to fourth encoding blocks, from the quantization step determining unit 103.
That is, the quantization step determining unit 103 determines the quantization step in quantizing the normalized data from the first to fourth encoding blocks, based on signals of the first to fourth encoding blocks from the frequency spectrum splitting unit 101, and outputs the quantization step information of the first to fourth encoding blocks, corresponding to the so determined quantization step, to the quantizing units 1041 to 1044, while outputting the quantization step information to the multiplexer 105.
In the quantizing units 1041 to 1044, the normalized data of the first to fourth encoding blocks are quantized with the quantization step, corresponding to the quantization step information of the first to fourth encoding blocks, and encoded, and the resulting quantization coefficients of the first to fourth encoding blocks are output to the multiplexer 105. In the multiplexer 105, the quantization coefficients, quantization step information and the normalization information of the first to fourth encoding blocks are encoded as necessary and multiplexed to produce encoded data which are transmitted over a transmission channel or recorded on a recording medium, not shown.
It is also possible for the quantization step determining unit 103 to determine the quantization step, based not only on the signals resulting from the frequency spectrum splitting, but also on normalized data, or taking into account the acoustic phenomena, such as masking effect.
An illustrative structure of the decoding apparatus 120 for decoding the encoded data output from the encoding apparatus 100 constructed as described above is shown in FIG. 2, in which encoded data are input to a demultiplexer 121 and decoded so as to be separated into the quantization coefficients, quantization step information and the normalization information of the first to fourth encoding blocks. The quantization coefficients, quantization step information and the normalization information of the first to fourth encoding blocks are supplied to signal component constructing units 1221 to 1224 associated with the respective encoding units.
In the signal component constructing units 1221, the quantization coefficients of the first encoding unit are inverse quantized, with the quantization step corresponding to the quantization step information of the first encoding unit, to produce the normalized data of the first encoding unit. In the signal component constructing units 1222, the normalized data of the first encoding unit are multiplied with a value corresponding to the normalization information of the first encoding unit to decode the signals of the first encoding unit, which are then output to a band synthesizing unit 123.
Similar operations are executed in the signal component constructing units 1222 to 1224 to decode the signals of the second to fourth encoding blocks, which are output to the band synthesizing unit 123. In this band synthesizing unit 123, the signals of the first to fourth encoding blocks are synthesized to restore the original audio signals.
Meanwhile, since the quantization step information is contained in the encoded data supplied (transmitted) from the encoding apparatus 100 of FIG. 1 to the decoding apparatus 120 of FIG. 2, the acoustic model used in the decoding apparatus can be set optionally. That is, the quantization step for each encoding unit can be set freely in the encoding apparatus, such that attempts may be made to improve the sound quality or the compression ratio, in keeping with the improved performance ability of the encoding apparatus or elaborateness of the acoustic model, without the necessity of changing the decoding apparatus.
However, in this case, the number of bits for encoding the quantization step information itself is increased, so that difficulties are met in improving the overall encoding efficiency beyond a certain value.
Although it is possible for the decoding apparatus to determine the quantization step information from e.g., the normalization information, instead of directly encoding the quantization step information, the relation between the normalization information and the quantization step information is set at a time point of setting the standard or design parameters, with the consequence that difficulties may be encountered in introducing the control of the quantization step based on a further advanced acoustic model that may be developed in future. In addition, if there is a certain tolerance in the compression ratio to be implemented, it is necessary to set the relation between the normalization information and the quantization step information from one compression ratio to another.