This invention relates to a method and apparatus for encoding input signals by high-efficiency encoding, a recording medium having the high efficiency encoded signals recorded thereon and a method and apparatus for decoding encoded signals transmitted over a transmission channel or reproduced from a recording medium to produce playback signals.
There exist a variety of high efficiency encoding techniques of encoding audio or speech signals. Examples of these techniques include transform coding in which a frame of digital signals representing the audio signal on the time axis is pre-set time units or frames and the frame-based time-axis audio signals are converted by an orthogonal transform into a block of spectral coefficients representing the audio signal on the frequency axis, and a sub-band coding in which the frequency band of the audio signal is divided by a filter bank into a plurality of sub-bands without forming the signal into frames along the time axis prior to coding. There is also known a combination of sub-band coding and transform coding, in which digital signals representing the audio signal are divided into a plurality of frequency ranges by sub-band coding, and transform coding is applied to each of the frequency ranges.
Among the filters for dividing a frequency spectrum into a plurality of equal-width frequency ranges include the quadrature mirror filter (QMF) as discussed in R. E. Crochiere, Digital Coding of Speech in Sub-bands, 55 Bell Syst. Tech J. No.8 (1976). With such QMF filter, the frequency spectrum of the signal is divided into two equal-width bands. With the QMF, aliasing is not produced when the frequency bands resulting from the division are subsequently combined together.
In xe2x80x9cPolyphase Quadrature Filtersxe2x80x94A New Subband Coding Techniquexe2x80x9d, Joseph H. Rothweiler ICASSP 83, Boston, there is shown a technique of dividing the frequency spectrum of the signal into equal-width frequency bands. With the present polyphase QMF, the frequency spectrum of the signals can be divided at a time into plural equal-width frequency bands.
There is also known a technique of orthogonal transform including dividing the digital input audio signal into frames of a predetermined time duration, and processing the resulting frames using a discrete Fourier transform (DFT), discrete cosine transform (DCT) and modified DCT (MDCT) for converting the signal from the time axis to the frequency axis. Discussions on MDCT may be found in J. P. Princen and A. B. Bradley, Subband Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellationxe2x80x9d, ICASSP 1987.
By quantizing the signals divided on the band basis by the filter or orthogonal transform, it becomes possible to control the band subjected to quantization noise and psychoacoustically more efficient coding may be performed by utilizing the so-called masking effects. If the signal components are normalized from band to band with the maximum value of the absolute values of the signal components, it becomes possible to effect more efficient coding.
In a technique of quantizing the spectral coefficients resulting from an orthogonal transform, it is known to use sub bands that take advantage of the psychoacoustic characteristics of the human auditory system. That is, spectral coefficients representing an audio signal on the frequency axis may be divided into a plurality of critical frequency bands. The width of the critical bands increase with increasing frequency. Normally, about 25 critical bands are used to cover the audio frequency spectrum of 0 Hz to 20 kHz. In such a quantizing system, bits are adaptively allocated among the various critical bands. For example, when applying adaptive bit allocation to the spectral coefficient data resulting from MDCT, the spectral coefficient data generated by the MDCT within each of the critical bands is quantized using an adaptively allocated number of bits. There are presently known the following two bit allocation techniques.
For example, in IEEE Transactions of Acoustics, Speech and Signal Processing, vol. ASSP-25, No.4, August 1977, bit allocation is carried out on the basis of the amplitude of the signal in each critical band. This technique produces a flat quantization noise spectrum and minimizes the noise energy, but the noise level perceived by the listener is not optimum because the technique does not effectively exploit the psychoacoustic masking effect.
In the bit allocation technique described in M. A. Krassner, The Critical Band Encoderxe2x80x94Digital Encoding of the Perceptual Requirements of the Auditory System, ICASSP 1980, the psychoacoustic masking mechanism is used to determine a fixed bit allocation that produces the necessary signal-to-noise ratio for each critical band. However, if the signal-to-noise ratio of such a system is measured using a strongly tonal signal, for example, a 1 kHz sine wave, non-optimum results are obtained because of the fixed allocation of bits among the critical bands.
For overcoming these inconveniences, a high efficiency encoding apparatus has been proposed in which the total number of bits available for bit allocation is divided between a fixed bit allocation pattern pre-set for each small block and a block-based signal magnitude dependent bit allocation, and the division ratio is set in dependence upon a signal which is relevant to the input signal such that the smoother the signal spectrum, the higher becomes the division ratio for the fixed bit allocation pattern.
With this technique, if the energy is concentrated in a particular spectral component, as in the case of a sine wave input, a larger number of bits are allocated to the block containing the spectral component, for significantly improving the signal-to-noise characteristics in their entirety. Since the human auditory system is highly sensitive to a signal having acute spectral components, such technique may be employed for improving the signal-to-noise ratio for improving not only measured values but also the quality of the sound as perceived by the ear.
In addition to the above techniques, a variety of other techniques have been proposed, and the model simulating the human auditory system has been refined, such that, if the encoding device is improved in its ability, encoding may be made with higher efficiency in light of the human auditory system.
FIG. 1 shows a structural example of an encoding apparatus (encoder) for an acoustic waveform signal.
In this figure, a waveform signal I101, entering an input terminal 10, is converted by a transform circuit 11 into a signal frequency component I102 and subsequently normalized and quantized by a normalization/quantization circuit 13, with the aid of the quantization step information I103 as found by a quantization step decision circuit 12.
The normalization/quantization circuit 13 outputs the normalization coefficient information I104 and the encoded signal frequency component I105 to a code string generating circuit 14. The code string generating circuit 14 generates, from the quantization step information I103, normalization coefficient information I104 and the encoded signal frequency I105, a code string I106, which is outputted at an output terminal 16.
FIG. 2 shows an illustrative arrangement of the converting circuit 11 shown in FIG. 1.
Referring to FIG. 2, an input waveform signal I201 corresponding to the input waveform signal I101 and supplied via a terminal 20 from the input terminal 10, is split by a first-stage spectrum splitting filter 21 into two frequency band signals I202, I203. That is, the bandwidth of each of the two frequency band signals I202, I203 is one-half of the bandwidth of the input waveform signal I201, that is, each frequency band signal I202, I203 is sub-sampled by one-one-half the input waveform signal I201. The remaining signal I203, divided by the spectrum splitting filter 22, is further split by the frequency splitting filter 22 into two band signals I204, I205. That is, the bandwidth of each of the two frequency band signals I204, I205 is one-half of the bandwidth of the input waveform signal I203 that is, each frequency band signal I204, I205 is sub-sampled by one quarter of the input waveform signal I201.
These signals I202, I204 and I205 are routed to respective associated forward spectrum transform circuits 23, 24 and 25 where they are processed with forward orthogonal transform, such as MDCT. Spectral signal components I206, I207 and I208, outputted by the spectrum transform circuits 23, 24 and 25, are routed via respective associated terminals 26, 27 and 28 to a downstream circuitry as a signal frequency component I102 outputted from the conversion circuit 11.
Of course, a number of conversion circuits other than that shown in FIG. 2 may be employed for splitting the frequency of the input waveform signal to form spectra signals. For example, the input signal may be directly transformed by MDCT into spectral signals, or transformed by DFT or DCT instead of by MDCT. If DFT or DCT is employed, the signal may be split into frequency band components by a frequency spectrum splitting filter, as in the case of FIG. 2.
FIG. 3 shows an illustrative construction of a decoding device configured to reproduce acoustic signals from the code string information generated by the encoding device of FIG. 1 and to output the reproduced signals.
Referring to FIG. 3, a code string I301, corresponding to the code string I106 shown in FIG. 1, is supplied to an input terminals 30 and thence supplied to a code string resolving circuit 31. The code string separating circuit 31 extracts, from the code string I301, the information I302 corresponding to the normalization coefficient information I104, the information I303 corresponding to the signal frequency component I105 and the information I304 corresponding to the quantization step information I103, and routes the extracted signals to a signal component decoding circuit 32.
The signal component decoding circuit 32 restores a signal frequency component I305, corresponding to the signal frequency component I102, from the information I302, I304 and I303, and routes the restored information to inverse-conversion circuit 33. The inverse, conversion circuit 33 effects, inverse-conversion corresponding to the conversion by the conversion circuit 11 for generating an acoustic waveform signal I306 which is outputted at an output terminal 34.
The inverse-conversion circuit 33 has a configuration as shown for example in FIG. 4 which is a counterpart of the configuration shown in FIG. 2.
In FIG. 4, signal components I401, I402 and I403, respectively corresponding to the signal components I206, I207 and I208, are supplied to terminals 40, 41 and 42, so as to be routed to respective associated inverse spectrum transform circuits 43, 44 and 45. These inverse spectrum transform circuits 43, 44 and 45 effect inverse orthogonal transform operations associated with the orthogonal transform operations performed by the forward spectrum transform circuits 23, 24 and 25, and output respective band signals I404, I405 and I406 associated with the signal components I202, I203 and I204, respectively.
Of the inverse orthogonal transformed signals, the signals I406, I405 are routed to a band synthesizing filter 46 so as to undergo signal synthesis which is a counterpart of the operation performed by the spectrum splitting filter 22. From the band synthesizing filter 47 is outputted via a terminal 48 a signal I408, which represents the acoustic waveform signal I306, and is outputted to the output terminal 304.
Referring to FIG. 5, the encoding method customarily employed in the encoder shown in FIG. 1 is explained.
In FIG. 5, the spectral signal components ES have been produced by converting the input acoustic waveform signals by the converting circuit 11 shown in FIGS. 1 and 2 at an interval of a pre-set time frame into 64 spectral signal components ES. These 64 spectral signal components ES are grouped into a preset number of, herein five, bands b1 to b5 so as to be normalized and quantized by the normalization/quantization circuit 13. These groups are herein referred to as an encoding unit. The bandwidths of the encoding units are selected to be narrower and broader towards the low and high frequency sides, respectively, so that generation of the quantization noise may be controlled so as to be suited to the characteristics of the human hearing mechanism. FIG. 5 shows the level of the absolute value of the spectral signal (frequency component) resulting from MDCT, represented in dB, and the values of the normalization coefficients of the respective encoding units.
FIG. 6 shows the manner in which the second encoding unit, for example, shown in FIG. 5, is normalized and quantized.
If, in FIG. 6, the seventh spectral signal component ES, as the maximum value in the encoding unit, is found as the normalization coefficient value, and quantized with e.g., 3 bits, there are obtained codes associated with the respective spectral signal components, as shown in FIG. 7. That is, there are obtained codes xe2x80x9c010xe2x80x9d, xe2x80x9c001xe2x80x9d, xe2x80x9c010xe2x80x9d, xe2x80x9c001xe2x80x9d, xe2x80x9c001xe2x80x9d, xe2x80x9c101xe2x80x9d, xe2x80x9c111xe2x80x9d and xe2x80x9c110xe2x80x9d corresponding to the first, second, third, fourth, fifth, sixth and seventh spectral signal components, as codes resulting from quantization with three bits, respectively. Since the actual quantized spectral signals have positive or negative signs, one more bit, that is a sign bit, is required in addition to the three bits shown in FIG. 7. However, this sign bit is not shown herein for clarity.
FIG. 8 shows an example of the code string I106 generated by the encoder shown in FIG. 1.
In this figure, the code string I106 is made up of information data for the five encoding units U1 to U5, each of which is made up of the quantization step information, normalization coefficient information and normalized and quantized signal component information data. This code string I106 is configured to be recorded on a recording medium, such as a magneto-optical disc. If an encoding unit information data has no quantization step information data, as in the case of the encoding unit information data U4, it indicates that encoding is not carried out in the encoding unit.
In the conventional method, the number of bits used for quantization is fixed from frame to frame.
Thus, if the spectral energy is concentrated in a high range side encoding unit of a broad bandwidth with a consequently increased number of spectral components, or if a large number of lone spectral components exist from a low range side to a high range side, a larger number of bits for quantization is required for quantization on the whole in order to secure sufficient sound quality. Thus the number of usable bits which is fixed from frame to frame is insufficient, that is the number of bits falls in shortage. Conversely, if the level of the input signal sound level is low, the number of bits used for quantization in a frame is decreased, as a result of which bits for quantization become redundant.
Consequently, the sound quality becomes insufficient if the number of bits falls in shortage if the bits for quantization falls into shortage, while the sound quality more than is necessary is produced if the bits become redundant, so that efficient encoding cannot be achieved.
It is therefore a principal object of the present invention to provide a method and apparatus for encoding, a method and apparatus for decoding and a recording medium in which changes in the sound quality due to bit surplus or shortage for quantization is eliminated to enable efficient encoding and decoding.
In one aspect, the present invention provides a method for encoding the information of an input signal using a fixed number of bits for each unit time frame, wherein part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is contained in the encoded information of the first frame.
The part of the encoding information includes the information indicating the second frame. The part of the encoding information for at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is surplus data which would surpass a pre-set fixed number of bits for the second frame if the input signal for the second frame were encoded using the number of bits which would be required for realizing the required quality of decoded signals obtained on decoding the encoded information for the second frame. In addition, the part of the encoded information is such data in the absence of which the encoded information of the second frame can at least be decoded. Also the part of the encoded information is subdivided and contained in a plurality of first frames.
With the information encoding method of the present invention, the encoded information of plural frames encoded using a number of bits necessary for producing the decoded signals of a required quality is preserved. If, when the input signal of each frame is encoded using the necessary number of bits, there is produced surplus data exceeding the fixed number of bits for each frame, such first frame among plural frames holding the encoded information in which the surplus data can be stored as the aforementioned part of the encoded information is searched. The surplus data is formed in a code string by being contained in the encoded information of the first frame in which the surplus data can be stored. In addition, with the information encoding method of the present invention, the input signal of a frame is encoded using a number of bits required for realizing the quality required of a decoded signal. If, when the input signal of the frame is encoded using the required number of bits, surplus data is produced which surpasses the fixed number of bits of the frame, such surplus data is preserved. If the required number of bits is less than the fixed number of bits of the frame, it is judged whether or not such preserved surplus data in the past can be stored in the frame, and if the preserved surplus data in the past can be stored, it is included in the encoding information of the frame and formed into a code string as the aforementioned part of the encoding information.
With the information decoding method of the present invention, a code string produced using a fixed number of bits for each unit time frame is decoded, wherein a code string in which part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is contained in the encoded information of the first frame is decoded.
If such part of the encoding information represents surplus data exceeding the fixed number of bits of the second frame when the signal of the second frame is encoded using a number of bits required for obtaining the quality required of a signal decoded from the encoded information of the second frame, and if a code string in which surplus data of an arbitrary second frame is contained in the encoded information of a first frame temporally posterior to the second frame is to be decoded, the surplus data contained in such first frame is separated and preserved. If the surplus data of the second frame is in the surplus data held so far, both surplus data are decoded. The part of the encoded information is preserved and, if, when such part of the encoded information is preserved, the recording capacity for preserving such part of the encoded information is exceeded, part of the encoded information of a frame older in the preserving sequence or further from the current frame is sequentially erased and part of the encoded information of the current frame is preserved. If such part of the encoding information represents surplus data exceeding the fixed number of bits of the second frame when the signal of the second frame is encoded using a number of bits required for obtaining the quality required of a signal decoded from the encoded information of the second frame, and if a code string in which surplus data of an arbitrary second frame is contained in the encoded information of a first frame temporally previous to such second frame is to be decoded, a code string of a pre-set number of frames is taken out. If the surplus data of the second frame is contained in the code string of the pre-set number of frames, such surplus data is also decoded.
In another aspect, the present invention provides an apparatus for encoding an input signal using a fixed number of bits for each unit time frame including means for separating part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame, and synthesizing means for incorporating such part of the encoded information separated by the separating means into the encoded information of the first frame.
The separating means incorporates the information indicating the second frame in the aforementioned part of the encoding information. Part of the encoding information for at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is surplus data which would surpass a pre-set fixed number of bits for the second frame if the input signal for the second frame were encoded using the number of bits which would be required for realizing the required quality of decoded signals obtained on decoding the encoded information for the second frame. In addition, such part of the encoded information is such data in the absence of which at least the encoded information of the second frame can be decoded.
The separating means subdivides the part of the encoding information while the synthesizing means incorporates the subdivided portions of such part of the encoding information in a plurality of first frames. The synthesizing means includes means for preserving the encoded information of plural frames encoded using a number of bits necessary for producing the decoded signals of a required quality, and means for discriminating such first frame among plural frames preserving the encoded information in which surplus data exceeding the fixed number of bits for each frame can be stored as the aforementioned part of the encoded information if, when the input signal of each frame is encoded using the necessary number of bits, there is produced such surplus data. The synthesizing means also includes means for generating a code string consisting in the encoding information of a first frame capable of storing the surplus data and the surplus data contained in the first frame. The information encoding apparatus also includes encoding means for encoding the input signal of a frame using a number of bits required for realizing the quality required of a decoded signal. The synthesizing means has preserving means for preserving surplus data which surpasses the fixed number of bits of the frame if, when the input signal of the frame is encoded using the required number of bits, the surplus data is produced, and means for judging whether or not preserved surplus data in the past can be stored in a frame if the required number of bits is less than the fixed number of bits of the frame. The synthesizing means also has means for incorporating said surplus data as the aforementioned part of the encoding information in a frame found to be capable of storing the surplus data for forming a code string.
The information decoding apparatus of the present invention is such apparatus in which a code string produced using a fixed number of bits for each unit time frame is decoded. A code string in which part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is contained in the encoded information of the first frame is decoded.
The information decoding apparatus includes separating means for separating surplus data contained in the first frame if the aforementioned part of the encoding information represents surplus data exceeding the fixed number of bits of the second frame when the signal of the second frame is encoded using a number of bits required for obtaining the quality required of a signal decoded from the encoded information of the second frame, and if a code string in which surplus data of an arbitrary second frame is contained in the encoded information of a first frame temporally posterior to said second frame is to be decoded. The apparatus also includes means for preserving the separated surplus data, synthesizing means for synthesizing surplus data of the second frame, if any, present in the surplus data preserved thus far, and decoding means for decoding the synthesized encoded information. The information decoding apparatus also includes holding controlling means whereby, if the recording capacity for holding the part of the encoded information is exceeded when preserving the part of the encoded information, part of the encoded information of a frame older in the holding sequence or further from the current frame is sequentially erased and part of the encoded information of the current frame is preserved. The information decoding apparatus also includes means for taking out a code string of a pre-set number of frames if the part of the encoding information represents surplus data exceeding the fixed number of bits of the second frame when the signal of the second frame is encoded using a number of bits required for obtaining the quality required of a signal decoded from the encoded information of the second frame, and if a code string in which surplus data of an arbitrary second frame is contained in the encoded information of a first frame temporally previous to said second frame is to be decoded. The information decoding apparatus also includes synthesizing means for synthesizing surplus data of the second frame, if any, present in the code string of the pre-set number of frames, and decoding means for decoding the synthesized encoded infirmation.
In still another aspect, the present invention also provides a recording medium for encoding the information encoded from an input signal using a fixed number of bits for each unit time frame, wherein a code string in which part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is contained in the encoded information of the first frame is recorded thereon.
The part of the encoding information for at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is surplus data which would surpass a pre-set fixed number of bits for the second frame if the input signal for the second frame were encoded using the number of bits which would be required for realizing the required quality of decoded signals obtained on decoding the encoded information for the second frame, wherein the surplus data of an arbitrary second frame is contained in the encoded information of a first frame temporally posterior or previous to the first frame.
That is, according to the present invention, data of a frame having an insufficient number of bits for quantization is written in a frame having redundant bits for quantization and the subsidiary information for identifying a frame to which belongs the data is annexed to the data for enabling decoding.
It depends on the delay time allowed by the encoding system or the pre-reading capability of the decoding system in which of the frames having redundant bits and lying ahead or at back of the currently processed frame is to be written the data of a frame suffering from shortage in quantization bits. This information can be written in the code string or specified by the system. If the data of the frame suffering from bit shortage can be subdivided, it can be efficiently contained in frames having redundant quantization bits.
If the data written in a frame suffering from bit shortage is such data that can be decoded by itself, it is unnecessary to preserve data to be written in the frame with redundant bits until processing of the storable frame or pre-read frame data in a prescribed amount in case of a limited system memory storage capacity. Thus the sound quality comparable to that of the conventional system may be achieved without obstructing the decoding process.
Thus the higher encoding efficiency may be achieved with the present invention than in the conventional method.
With the information encoding method and apparatus of the present invention, part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is contained in the encoded information of the first frame for adjusting the surplus/deficit of the number of the quantization bits.
With the information decoding method and apparatus of the present invention, a code string in which part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is contained in the encoded information of the first frame is decoded for adjusting the surplus/deficit of the number of the quantization bits.
With the recording medium according to the present invention, a code string in which part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is contained in the encoded information of the first frame is recorded for adjusting the surplus/shortage in the number of quantization bits.
Thus it is seen that, with the information encoding method and apparatus, information decoding method and apparatus and the recording medium according to the present invention, part of the encoded information of at least one second frame temporally consecutively or non-consecutively preceding or following a first frame is contained in the encoded information of the first frame for adjusting the surplus/shortage of the number of quantization bits, so that data of a frame with redundant bits can be transmitted beyond such frame resulting in efficient encoding and decoding.
If the present invention is applied to encoding of acoustic signals, data of a frame suffering from noise due to shortage in encoding bits may be written in a frame having redundant bits for reducing the noise in the decoded acoustic signals as heard by ears, thus enabling efficient encoding and decoding of information signals.