1. Field of the Invention
This invention relates to an apparatus for compressing a digital signal, and more particularly to an apparatus that divides the digital input signal into signal components in blocks in both the frequency domain and the time domain and orthogonally transforms each block of signal components. The apparatus applies bit allocation to each block of orthogonally-transformed signal components to quantize them.
2. Description of the Prior Art
As one of technologies for compressing a digital audio signal, etc. orthogonal transform coding technology orthogonally transforms a digital input signal in the time domain into spectral coefficients in the frequency domain. The resulting spectral data is then quantized. As an example of orthogonal transform technology, Fast Fourier Transform (FFT) processing can be applied to a block of, e.g., audio PCM data containing a fixed number of samples in the time domain.
Further, a coding technology has been proposed in which an input signal is divided into signal components in plural frequency ranges prior to such orthogonal transform processing as that described above. The signal component in each frequency range is divided in time into blocks to orthogonally transform it, and bit allocation is applied to the orthogonally transformed signal component in each respective block.
A signal that has been orthogonally transformed by, e.g., a FFT, etc. in the compressor undergoes inverse Fast Fourier Transform (IFFT) processing the expander. At this time, in general, when the frequency analysis accuracy in the orthogonal transform processing is high, the accuracy in the time domain is degraded. This gives rise to a phenomenon called a pre-echo, in which a sound is heard at low level preceding itself. This is particularly noticeable at the beginning of a transient or non-steady portion of a signal. Such a phenomenon is subjectively disturbing to the listener, and has a great influence on the listener's perception of coding quality.
In the block B in the time domain shown in FIG. 7, the signal includes a portion C in which the level of the signal suddenly increases, e.g. as when castanets are played or a triangle is struck. The signal also includes the portion U in which there exists almost no signal (or in which the signal level is very small). When FFT processing is implemented on the signal in the block B, and IFFT processing is performed on the resulting compressed signal in the expander, quantizing noise is heard while there is no signal, i.e., during the portion U, as shown in FIG. 8.
The human sense of hearing has a characteristic called masking. Masking is classified into temporal masking and simultaneous masking. With simultaneous masking, a low-level sound produced simultaneously with a higher-level sound is rendered inaudible because it is masked by the higher-level sound. With temporal masking, a low-level sound preceding or succeeding a higher-level sound is rendered inaudible because it is masked by the higher-level sound. FIG. 9 illustrates temporal masking. In this, the higher-level sound C provides forward masking FM to low-level sounds following it in time. Forward masking is effective over a relatively long time (e.g about 100 msec). The higher-level sound C also provides backward masking BM to lower-level sounds preceding it in time. Backward masking is effective for only a relatively short time (e.g. about 5 msec).
For this reason, when the signal level suddenly rises towards the end of the block to which the FFT transform is applied, the resulting quantizing noise pre-echo preceding the transient is easily heard in the expanded signal. Quantizing noise pre-echo is subjectively offensive to the listener.