This invention relates to a method and apparatus for information encoding and decoding wherein input digital data is encoded by high efficiency encoding and recorded and subsequently reproduced and decoded to produce playback signals.
There have hitherto been proposed a variety of techniques for high efficiency encoding for audio or speech signals, for example, a sub-band coding (SBC) in which audio signals on the time axis are divided into plural frequency ranges without dividing the audio signals on the time axis into blocks at an interval of unit time, or a blocking and frequency dividing system, that is a so-called transform encoding system, in which signals on the time axis are blocked at a pre-set unit time and converted into signals on the frequency axis on the block basis, and the resulting spectral signals are divided into plural frequency bands and encoded on the band basis. There is also a technique of high efficiency encoding consisting in a combination of the sub-band coding and the transform coding, according to which audio signals are divided into plural frequency ranges by SBC and transform coding is independently applied to each of the frequency ranges.
Known filters for dividing a frequency spectrum into a plurality of frequency ranges include the quadrature mirror filter (QMF), as discussed in, for example, R. E. Crocherie, Digital Coding of Speech in Subbands, 55 Bell Syst. Tech. J., No. 8 (1976). The technique of dividing a frequency spectrum into equal-width frequency ranges is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filter--A New Subband Coding Technique, ICASSP 83 Boston.
Known techniques for orthogonal transform include the technique of dividing the digital input audio signals into frames of a predetermined time duration, and processing the resulting frames using a Fast Fourier transform (FFT), discrete cosine transform (DCT) or modified DCT (MDCT) to convert the signals from the time axis to the frequency axis. Discussion of a MDCT may be found in J. P. Princen and A. B. Bradley, Subband/Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation, ICASSP 1987.
With the above-described high efficiency coding, signals divided into plural frequency bands by a filter or spectrum conversion may be quantized for controlling the range subjected to quantization noise. High efficiency coding may be achieved by taking advantage of, for example, masking effects, in order to take into account the psychoacoustic characteristics of the human hearing mechanism. If signals are normalized before quantization with the maximum value of absolute values of signal components in each frequency band, encoding may be achieved with higher efficiency.
In making the division of the frequency spectrum into plural bands for quantization, it may be made so that the human auditory characteristics are taken into account. That is, audio signals may be divided into plural, for example, 25 bands, in accordance with the critical bands in which the bandwidths become broader towards higher frequency. The data in each critical band is encoded by fixed or adaptive bit allocation. For example, when encoding coefficient data produced by MDCT operations, the MDCT coefficient data for each band, produced by the MDCT operations on the block basis, is encoded with an adaptively allocated number of bits. The following two techniques are known as the bit allocation technique.
Known adaptive bit allocation techniques include that described in IEEE TRANS. ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL.ASSP-25, No.4 (1977 August) in which bit allocation is carried out on the basis of the amplitude of the signal in each critical band. This technique produces a flat quantization noise spectrum and minimizes noise energy, but the noise level perceived by the listener is not optimum because the technique does not effectively exploit the psychoacoustic masking effect.
In the bit allocation technique described in M. A. Krasner, The Critical Band Encoder-Digital Encoding of the Perceptual Requirements of the Auditory System, ICASSP 1980, the psychoacoustic hearing mechanism is used to determine a fixed bit allocation that produces the necessary signal-to-noise ratio for each critical band. However, if the signal-to-noise ratio of such a system is measured using a strongly tonal signal, such as a sine wave, non-optimum results are obtained because of fixed allocation of bits among the critical bands.
In order to solve these problems, there is proposed a high efficiency encoding apparatus in which the total number of bits available for bit allocation is divided into those for fixed bit allocation for each sub-block divided from a block and those for bit allocation dependent on the signal energy in each block. The division ratio is set in dependence upon a signal related to the input signal so that the smoother the signal spectrum, the larger becomes the division ratio for the fixed bit allocation.
With this method, if the energy is concentrated in a particular spectral component, as in the case of a sine wave, a larger number of bits may be allocated to a block containing the spectral component for improving the overall signal to noise characteristics. This is effective in improving not only measured values, but also the sound quality as perceived by the ear, inasmuch as the human auditory system is usually extremely sensitive to a signal having sharp spectral components.
There are also a variety of methods for bit allocation and expectation may be made of an encoding with still higher efficiency if the model concerning the human auditory system is refined and an encoding device with a higher capacity is devised.
If the original signal contains acutely changed signal components, which are not necessarily adjacent to signal waveform portions with larger signal amplitudes, the quantization noise on the waveform signal is occasionally increased in the original signal waveform portions not having larger signal amplitudes. The quantization noise generated in the portions of the original signal waveform not having larger signal amplitudes becomes objectionable to the ears because it cannot be covered by concurrent masking by the signals of the acutely changing waveform portions. Above all, if the waveform signals are converted by spectrum conversion into a large number of frequency components, time resolution is lowered such that a larger quantization noise is generated for an extended time. If, for example, the conversion length of the spectrum conversion is reduced, the period in which the quantization noise is generated is also reduced. However, the frequency resolution then is also lowered and the encoding efficiency in the sub-stationary portion is lowered. If the above-mentioned spectrum conversion followed by inverse spectrum conversion is utilized, the quantization noise is produced which is not masked by concurrent masking by acutely changing signal portions. Such quantization noise generated temporally before the acutely changing signal portion is termed the pre-echo, which will be explained subsequently in detail.
For obviating such inconvenience, there is proposed a method of using a variable conversion length so that the conversion length is shortened only in the acutely changed signal waveform portions at the cost of the frequency resolution.
In FIGS. 1 and 2, a frequency dividing circuit and a frequency range synthesizing circuit in an encoding apparatus disclosed in EP Laid-Open Patent Publication No. 0537361 (Laying-Open Date, Apr. 21, 1993) are shown in a block circuit diagram. In the dividing and synthesizing circuits, the conversion length for spectrum conversion is designed to be variable.
Referring to FIG. 1, showing the frequency dividing circuit, the input audio signal supplied to an input terminal 300 is sent to a first stage frequency dividing filter 301 of a dual filter 301-302. The filter 301 divides the signal into two frequency band signals one of which is sent to the next stage frequency dividing filter 302. The filter 302 divides the signal from the filter 301 into two frequency band signals. Thus the input audio signal, supplied to the terminal 300, is divided by the filters 301 and 302 into three frequency bands.
The respective band signals from the frequency dividing filters 301, 302 are supplied to associated forward spectrum conversion circuits 321, 322 and 323 so as to be thereby converted to spectral signals. These forward spectrum conversion circuits may be implemented by the above-mentioned MDCT device.
The above arrangement is characterized by the variable conversion length of the forward spectrum conversion in each forward spectrum conversion circuit. Such conversion length is determined based upon the band signals by conversion length decision circuits 311, 312 and 313. By using the variable conversion lengths, both the sub-stationary waveform portions and the transient signal waveform portions can be encoded with psychoacoustically high encoding efficiency, as will be explained subsequently.
The spectral signals from the forward spectrum conversion circuits 321, 322 and 323 are grouped among pre-set frequency bands, for example, critical frequency bands. The conversion length information from each of the conversion length decision circuits 311 to 313 is outputted at terminals 303, 305 and 307, respectively. Outputs of the terminals 303 to 308 are processed by normalization and quantization circuits, not shown, and converted into a code string by a multiplexor, also not shown, so as to be transmitted or recorded on a recording medium.
FIG. 2 shows, in a block circuit diagram, an arrangement of a frequency range synthesizing circuit of a decoding device for decoding signals encoded by the encoding device having the frequency dividing circuit shown in FIG. 1.
Referring to FIG. 2, the code string from the encoding device is divided by a demultiplexor, not shown, provided in a pre-stage of each inverse spectrum conversion circuit 421, 422 of the frequency band synthesizing circuit, inverse-quantized and inverse-normalized and grouped among three bands associated with outputs of the frequency dividing filter shown in FIG. 1. The band-based conversion length data associated with outputs of the terminals 303, 305 and 307 of FIG. 1 are supplied to terminals 401, 402 and 403, respectively, while the spectral data associated with outputs of the terminals 304, 306 and 308 of FIG. 1 are supplied to terminals 411, 412 and 413, respectively.
The band-based data are supplied to associated inverse spectrum conversion circuits 421, 422 and 423 which calculate three-band signals based upon the input data. These three-band signals are fed to a dual frequency range synthesizing filter 424-431.
Outputs of the inverse spectrum conversion circuits 422 and 423 are synthesized by a frequency range synthesizing filter 424, while outputs of the inverse spectrum conversion circuits 421 and 424 are synthesized by a frequency range synthesizing filter 431. The synthesizing filter 431 produces band-synthesized audio signals which are outputted at a terminal 430.
Referring to FIGS. 3(A) to FIG. 3(D), the effect of providing for the variable conversion length in the frequency dividing circuit and the frequency synthesizing circuit shown in the arrangements of FIGS. 1 and 2 is explained.
For a sub-stationary signal waveform in general, a longer conversion length, which may be achieved using a conversion window function having a long conversion length as shown in FIG. 3(A), gives a higher encoding efficiency, because the energy is thereby concentrated in particular spectral coefficients.
However, if the signal waveform is converted into spectral signals which are then inverse-converted into signals on the time axis, the quantization noise is substantially uniformly distributed within a conversion block (block employed during conversion into spectral signals). Consequently, if the long conversion length is used in the acutely changing signal waveform portion, a larger quantization noise QN is produced even in a small-amplitude waveform portion, as shown in FIG. 3(B). This noise QN is not psychoacoustically masked by the concurrent masking effect by the acutely changing signal waveform SW.
The masking effect comprises, in addition to the concurrent masking, the forward masking in which temporally previous sound masks the temporally succeeding sound, and a backward masking in which temporally succeeding sound masks the temporally preceding sound. The backward masking manifests its effect only for a brief time period as compared to the forward masking. Thus the quantization noise temporally previous to the waveform portion having a rapidly increased sound is extremely objectionable to the ear as the pre-echo.
Consequently, spectrum conversion may be made with a short conversion length (short conversion window function) in one of the bands in which the signal becomes suddenly larger, as shown in FIG. 3(D), thereby reducing the period in which the quantization noise QN is produced, as shown in FIG. 3D, so that the masking by the backward masking becomes effective.
Long and short conversion lengths may also be used in combination. FIGS. 4(A) and 4(B) show the state in which conversion is being made with the long and short conversion lengths. If there is the acutely changing signal, as shown in FIG. 4B, the conversion window function is changed over from the long conversion window function to the short conversion window function at the acutely changing waveform portion, as shown in FIG. 4A.
Although psychoacoustically efficient encoding may be achieved in this manner not only for the sub-stationary signal waveform portion but also for the transient waveform portion, the number of the spectral components are variable from one conversion block to another, because of the variable conversion length, thereby complicating the encoding and decoding apparatus. That is, if the conversion length is variable, it is necessary to provide conversion means capable of coping with variable conversion length in the encoding and decoding apparatus. In addition, since the number of spectral components is proportional to the conversion length, the frequency band associated with the spectral components is varied with the conversion lengths, so that, if the spectral components are grouped among the critical bands for encoding, the number of the spectral components comprised within each critical band also becomes different, thus complicating the encoding and decoding operations.