One method of highly efficient compressed encoding of digital data such as musical tones and sounds is ATRAC (Adaptive Transform Acoustic Coding), used in mini discs. In ATRAC, since the digital data is compressed with high efficiency, it is first broken down into a plurality of frequency bands, then divided into blocks in accordance with time units of variable length, transformed into spectral signals by MDCT (Modified Discrete Cosine Transform) processing, and then each spectral signal is encoded by the number of quantized bits which have been allocated to it, taking into account aural-psychological characteristics.
Among the aural-psychological characteristics which can be applied to the compressed encoding are loudness-level characteristics and masking effect. Loudness-level characteristics show that, even with the same sound pressure level, the loudness of a sound sensed by a person changes according to the frequency of the sound. Accordingly, this shows that the minimum limit of audibility, which shows the smallest loudness which can be heard by a person, changes according to the frequency. As for masking effect, there are two kinds: simultaneous masking effect and elapsed masking effect. Simultaneous masking effect is a phenomenon in which, when several sounds of different frequency composition occur simultaneously, one sound makes another difficult to hear. Elapsed masking effect is a phenomenon in which the masking occurs before and after a loud sound along the time axis of the loud sound.
An example of conventional art which makes use of the elapsed masking effect is Japanese Unexamined Patent Publication No. 5-91061/1993. In this conventional art, when a transient signal is included in one of the frequency conversion time units, bits are allocated in accordance with a word length which varies depending on the energy of previous time units and on the amount of masking, thereby preventing a sound quality deterioration called "pre-echo." Again, Japanese Unexamined Patent Publication No. 5-248972/1993 proposes a technique for improving the efficiency of encoding by using elapsed masking in reference to the spectral distribution of previous time units.
Another example of bit allocation using the aural-psychological characteristics is one called the repetition method, in which actual bit allocation suited to input digital data is performed as follows. First, the power S of each frequency band, and the masking threshold M of that power S on the other frequency bands, are found. Next, from the masking threshold M and the power of quantized noise N(n) (when each frequency band is quantized into n bits), is calculated the ratio of the masking threshold to noise, being MNR(n)=M/N(n). Then, after bit allocation for the frequency band with the smallest ratio of masking threshold to noise MNR(n), that ratio of masking threshold to noise MNR(n) is re-calculated, and bits are allocated to the frequency band with the lowest ratio.
Note that the aural characteristics of persons with typical aural characteristics are the model for the minimum limit of audibility, masking threshold, etc. mentioned above. Accordingly, there are cases where listeners will feel a sense of incongruity due to differences in hearing or preference.
For example, in cases where the spectral composition of the input digital data is comparatively flat, like white noise, bit allocation will be made with the masking threshold at the minimum limit of audibility, so most of the quantized bits will be allocated to the mid- to low-range. Accordingly, depending on the size of the spectral composition, quantized bits may not be allocated to the ultra-low and ultra-high ranges, giving some listeners a sense of incongruity.
Again, when the input digital data is a composite wave composed of a signal with a narrow spectrum band (such as a sine wave signal) and white noise, the frequency bands f1 which include the sine wave signal will have more power, but as for frequency bands f2 which are far from the frequency bands f1, the farther from the frequency bands f1, the greater the drop in power. Accordingly, there will be almost no masking from the sine wave signal at a frequency band f2, and the influence of masking from the power of the frequency band f2 itself is increased. Because of this, there will be no great difference between the ratio of signal to masking threshold (SMR: the ratio of a frequency band's own power S to masking threshold M) at the frequency bands f1 and the same ratio SMR at the frequency bands f2.
In other words, if the power of a signal is S, and the power of quantized noise is N(n) when each frequency band is quantized into n bits, then, based on the relative relationship between the two, the ratio of masking threshold to noise MNR(n)=M/N(n)=(S/N(n))/(S/M(n)) will be approximately the same value at the frequency bands f1 and f2. Accordingly, since the conventional adaptive bit allocation methods perform bit allocation based only on the ratio of masking threshold to noise MNR(n), their drawback is that approximately the same number of bits are allocated to the frequency bands f1 and f2.
As a result, if there are many frequency bands f2 which are not influenced by the masking from the sine wave signal, the number of bits allocated to the frequency bands f1 which include the sine wave signal becomes relatively smaller, the quantization error of the sine wave signal becomes greater, and sound quality deteriorates.
In regard to this point, the present Applicant has proposed, in Japanese Unexamined Patent Publication 7-202823/1995, a structure which automatically limits the number of bits which may be allocated to frequency bands with low power S. However, a drawback of this conventional art is that, since the maximum number of bits which may be allocated to each frequency band is determined on the basis of its power, when the power of white noise is large, there are cases when no limitation on bit allocation to that frequency band is made.