1. Field of the Invention
This invention relates to apparatus for encoding digital signals.
2. Description of the Prior Art
A known digital signal encoding apparatus employs a bit allocation encoding technique, according to which an input digital signal, such as a speech or other audio signal, is divided into a plurality of channels on a time or frequency axis and the number of bits for each of the channels is adaptively allocated so as to efficiently encode the input digital signal. The apparatus may, for example, employ: subsidiary (sub) band coding (SBC), in which the audio or other signal is divided on a time axis into a plurality of frequency bands for encoding the signal; so-called adaptive transformation coding (ATC), in which the signal is divided into a plurality of frequency bands by quadrature-transforming the signal on a time axis into a signal on a frequency axis for adaptively encoding the signal in each band; or so-called adaptive bit allocation coding (APC-AB), which is a combination of sub band coding (SBC) and so-called adaptive predictive coding (APC) which divides a signal on a time axis into a plurality of bands, transforms each band signal into base bands (low frequency bands), and thereafter performs a plurality of orders of linear predictive analysis for predictively encoding the signal.
A specific process used, for example, for band dividing in these various efficient encoding techniques may include the steps of dividing an input audio signal in a given unit time into blocks, transforming (quadrature or orthogonal transformation) a time axis into a frequency axis by performing a fast Fourier transform (FFT) for each block to obtain FFT coefficient data from each block, and dividing the coefficient data into a plurality of frequency bands. In this case, encoding is performed by quantizing (requantizing) the FFT coefficient data. Division of an audio signal into bands may be performed in such a manner as to take account of, for example, characteristics of the human sense of hearing. That is, an audio signal may be divided into a plurality of bands (for example, 25 bands) so that higher frequency bands, which are generally referred to as critical bands, have wider bandwidths.
The human sense of hearing includes various sound masking effect, including a so-called temporal masking effect and a so-called simultaneous masking effect. The simultaneous masking effects results in a sound (or noise) of relatively low level that is generated simultaneously with a sound of relatively high level being masked by the sound of relatively high level so that the sound of relatively low level cannot be heard. The temporal masking effect occurs both after and before a sound of high level so as to provide so-called forward masking and backward masking effects, respectively. The forward masking effect lasts for a relatively long period of time (for example, about 100 milliseconds) after a high level sound transient, while the backward masking effect lasts for a short period of time (for example, about 5 milliseconds). The levels (amounts) of the forward and backward masking effects are about 20 dB and 30 dB, respectively.
If an audio signal in a given unit time block is subjected to fast Fourier transformation when the signal is encoded, an inverse fast Fourier transformation (IFFT) is performed when the signal is decoded. Noise generated by the FFT and IFFT will generally appear over the entirety of the block in the signal obtained by the decoding and encoding. Accordingly, if a transient sound level change occurs in a block B which is subjected to an FFT or IFFT, or, for example, if a transient increase in a signal in the block B results from the arrival of a signal portion C having an abruptly increasing level, like a signal generated by a percussion instrument such as a castanet, into a "no sound" or "no signal" portion U of the block, as shown in FIG. 1, a noise component generated by the carrying out of the FFT or IFFT processing will occur in the no-signal portion U. That is, noise components resulting from the high level signal portion C will occur in the no-signal portion U, as shown in FIG. 2. Therefore, when the signal is reproduced, the noise produced in the inherently no-signal portion or area is readily perceptible. The noise generated after the high level signal portion C by FFT or IFFT processing of a block having such a transient change can be relatively less readily heard since it is masked by the relatively long duration forward masking FM, as shown in FIG. 3. The noise generated before the high level signal portion C is, however, more readily heard, since the backward masking BM effect lasts for a relatively short period of time. That is, the noise generated before the time when the backward masking becomes effective is more readily heard.
As an example of a countermeasure for the case in which suppression of noise by the above-mentioned backward masking BM cannot be expected, the length of the unit time block to which the fast Fourier transfer is applied could be shortened so as to be about equal to the period of time (for example, 5 milliseconds) over which the backward masking BM is effective. That is, the time resolution of the efficient encoding could be increased (shortening the block length) to a period of time over which the backward masking BM effect caused by the high level signal portion C is effective.
However, since shortening of the unit time block length which is Fourier transformed will decrease the number of samples in the block, the frequency resolution provided by the Fourier transform is conversely lowered.
In general, the frequency analysis capability of the human sense of hearing is relatively low and relatively high at higher and lower frequencies, respectively. Accordingly, in practice, the unit item block length cannot be greatly shortened in view of the necessity of ensuring the required frequency resolution in the lower frequency band or range. That is, it is preferable to have a higher time resolution in the lower frequency band or range.
In general, since the block is longer for a lower frequency band signal and is conversely shorter for a high frequency band signal, it is effective to increase the time resolution (to shorten the block length) at the higher frequency band.