The present invention relates to a method. of encoding digital data in which, when recording musical tones, sounds, etc. in recording media such as mini-discs, bits are allocated to the spectrum of each frequency band in response to the musical tones, sounds, etc. so as to compress data volume.
One method of highly efficient compressed encoding of digital data such as musical tones and sounds is ATRAC (Adaptive Transform Acoustic Coding), used in mini discs. In ATRAC, since the digital data is compressed with high efficiency, it is first broken down into a plurality of frequency bands, then divided into blocks in accordance with time units of variable length, transformed into spectral signals by MDCT (Modified Discrete Cosine Transform) processing, and then each spectral signal is encoded by the number of quantized bits which have been allocated to it, taking into account aural-psychological characteristics.
Among the aural-psychological characteristics which can be applied to the compressed encoding are loudness-level characteristics and masking effect. Loudness-level characteristics show that, even with the same sound pressure level, the loudness of a sound sensed by a person changes according to the frequency of the sound. Accordingly, this shows that the minimum limit of audibility, which shows the smallest loudness which can be heard by a person, changes according to the frequency. As for masking effect, there are two kinds: simultaneous masking effect and elapsed masking effect. Simultaneous masking effect is a phenomenon in which, when several sounds of different frequency composition occur simultaneously, one sound makes another difficult to hear. Elapsed masking effect is a phenomenon in which the masking occurs before and after a loud sound along the time axis of the loud sound.
An example of conventional art which makes use of the elapsed masking effect is Japanese Unexamined Patent Publication No. 5-91061/1993. In this conventional art, when a transient signal is included in one of the frequency conversion time units, bits are allocated in accordance with a word length which varies depending on the energy of previous time units and on the amount of masking, thereby preventing a sound quality deterioration called xe2x80x9cpre-echo.xe2x80x9d Again, Japanese Unexamined Patent Publication No. 5-248972/1993 proposes a technique for improving the efficiency of encoding by using elapsed masking in reference to the spectral distribution of previous time units.
Another example of bit allocation using the aural-psychological characteristics is one called the repetition method, in which actual bit allocation suited to input digital data is performed as follows. First, the power S of each frequency band, and the masking threshold M of that power S on the other frequency bands, are found. Next, from the masking threshold M and the power of quantized noise N(n) (when each frequency band is quantized into n bits), is calculated the ratio of the masking threshold to noise, being MNR(n)=M/N(n). Then, after bit allocation for the frequency band with the smallest ratio of masking threshold to noise MNR(n), that ratio of masking threshold to noise MNR(n) is recalculated, and bits are allocated to the frequency band with the lowest ratio.
Note that the aural characteristics of persons with typical aural characteristics are the model for the minimum limit of audibility, masking threshold, etc. mentioned above. Accordingly, there are cases where listeners will feel a sense of incongruity due to differences in hearing or preference.
For example, in cases where the spectral composition of the input digital data is comparatively flat, like white noise, bit allocation will be made with the masking threshold at the minimum limit of audibility, so most of the quantized bits will be allocated to the mid- to low-range. Accordingly, depending on the size of the spectral composition, quantized bits may not be allocated to the ultra-low and ultra-high ranges, giving some listeners a sense of incongruity.
Again, when the input digital data is a composite wave composed of a signal with a narrow spectrum band (such as a sine wave signal) and white noise, the frequency bands f1 which include the sine wave signal will have more power, but as for frequency bands f2 which are far from the frequency bands f1, the farther from the frequency bands f1, the greater the drop in power. Accordingly, there will be almost no masking from the sine wave signal at a frequency band f2, and the influence of masking from the power of the frequency band f2 itself is increased. Because of this, there will be no great difference between the ratio of signal to masking threshold (SMR: the ratio of a frequency band""s own power S to masking threshold M) at the frequency bands fl and the same ratio SMR at the frequency bands f2.
In other words, if the power of a signal is S, and the power of quantized noise is N(n) when each frequency band is quantized into n bits, then, based on the relative relationship between the two, the ratio of masking threshold to noise MNR(n)=M/N(n)=(S/N(n))/(S/M(n)) will be approximately the same value at the frequency bands f1 and f2. Accordingly, since the conventional adaptive bit allocation methods perform bit allocation based only on the ratio of masking threshold to noise MNR(n), their drawback is that approximately the same number of bits are allocated to the frequency bands f1 and f2.
As a result, if there are many frequency bands f2 which are not influenced by the masking from the sine wave signal, the number of bits allocated to the frequency bands f1 which include the sine wave signal becomes relatively smaller, the quantization error of the sine wave signal becomes greater, and sound quality deteriorates.
In regard to this point, the present Applicant has proposed, in Japanese Unexamined Patent Publication 7-202823/1995, a structure which automatically limits the number of bits which may be allocated to frequency bands with low power S. However, a drawback of this conventional art is that, since the maximum number of bits which may be allocated to each frequency band is determined on the basis of its power, when the power of white noise is large, there are cases when no limitation on bit allocation to that frequency band is made.
One object of the present invention is to provide a method of encoding digital data capable of attaining a sound quality which accords with the listener""s hearing.
Another object of the present invention is to provide a method of encoding digital data capable of preventing deterioration of sound quality even of signals with narrow spectrum bands.
In order to realize the first object mentioned above, the first method of encoding digital data of the present invention encodes digital data such as musical tones and sounds by converting it into frequency domains, dividing the converted spectra into a plurality of frequency bands, changing a minimum limit of audibility characteristic so as to set a masking threshold, and allocating quantized bits for each frequency band in accordance with ratios of masking threshold to noise which are found for each frequency band in accordance with power or energy of each frequency band in consideration of aural-psychological characteristics.
The above structure, by enabling change of the minimum limit of audibility characteristic among aural-psychological characteristics, frees aural-psychological characteristics from definition by the characteristics of persons with typical hearing, and makes possible selection of whether or not to allocate bits to spectra with small inaudible domains, or spectra with ultra-low or ultra-high domains. Accordingly, it becomes possible to respond to persons with superior hearing or to individual, subjective preference, and sound quality which accords with listeners"" hearing can be attained.
Next, in order to realize the first object. mentioned above, the second method of encoding digital data of the present invention encodes digital data such as musical tones and sounds by converting it into frequency domains, dividing the converted spectra into a plurality of frequency bands, changing a masking characteristic so as to set a masking threshold, and allocating quantized bits for each frequency band in accordance with ratios of the masking threshold to noise for each frequency band which are found in accordance with power or energy of each frequency band in consideration of aural-psychological characteristics.
The above structure, by enabling change of the masking characteristic among the aural-psychological characteristics, frees aural-psychological characteristics from definition by the characteristics of persons with typical hearing, and makes possible selection of whether to allocate bits, for example, to spectra which, for example, suffer masking in a critical band. Accordingly, it becomes possible to respond to persons with superior hearing or to individual, subjective preference, and sound quality which accords with listeners"" hearing can be attained.
Next, in order to realize the first object mentioned above, the third method of encoding digital data of the present invention encodes digital data such as musical tones and sounds by converting it into frequency domains, dividing the converted spectra into a plurality of frequency bands, and switching among (i) bit allocation in accordance with ratios of masking threshold to noise which are found for each frequency band in accordance with power or energy of each frequency band in consideration of aural-psychological characteristics, (ii) bit allocation in accordance with a representative value of the power or the energy of each frequency band, and (iii) bit allocation giving weight to each of the foregoing bit allocation methods.
With respect to data, such as white noise having a spectral composition which is comparatively flat, the above structure makes possible bit allocation which is flat along the frequency axis. Again, with respect to data, such as sine wave signals, with narrow band width, the above structure makes possible bit allocation which emphasizes the signal with narrow band width. Accordingly, selection of a sound quality which is suited to the source of the musical tone is made possible.
Finally, the fourth method of encoding digital data of the present invention, in order to realize the second object mentioned above, switches among bit allocation methods (i), (ii), and (iii) described in the third method of encoding digital data in accordance with a relationship between the masking threshold and peaks and local peaks found based on differences in power or energy between adjacent spectra within each frequency band.
The above structure makes it possible to automatically allocate bits according to the method most suited to the digital data, whether it is white noise or other data with wide band width, or sine wave signals or other data with narrow band width, thus preventing deterioration of sound quality, even with musical tones not suited to bit allocation using simultaneous masking such as the masking threshold/noise ratio.
The other objects, features, and superior points of the present invention will be made clear by the description below. Further, the advantages of this invention will be evident from the following explanation in reference to the Figures.