1. Field of the Invention
This invention relates to a high efficiency encoding method and device for audio signals by subdividing digital audio signals into blocks and quantizing the signals from block to block by adaptive bit allocation.
2. Description of the Related Art
As a technique for high efficiency encoding of compressing and encoding audio signals, there is known a method of dividing digital audio signals into blocks at an interval of a predetermined number of samples or at an interval of a predetermined time frame, processing digital data with block floating techniques from block to block, quantizing the digital data in each block by adaptive bit allocation and transmitting the quantized digital data.
There is also known a method of transmitting parameters relevant to the block floating simultaneously with quantized and high efficiency encoded audio data.
The term block floating means the processing of multiplying each word of plural digital data in each block by a common word to give larger values for improving quantization accuracy.
Specifically, a maximum one of the absolute values of each word is found for the plural digital data in each block and all words in each block are processed by floating using a common floating coefficient by which the maximum absolute value is not saturated.
A block floating by 6 dB by bit shifting is one of the simpler examples of block floating.
Such block floating is performed on spectral signals produced by transforming time-domain audio signals of each time frame into frequency-domain signals by discrete transform.
For illustrating the above-mentioned block floating, FIG. 8 diagrammatically shows how audio signals are subdivided into blocks.
The audio signals may be represented two-dimensionally as shown in FIG. 8 in which the abscissa and the ordinate indicate the time and the frequency, respectively.
The line segment indicating the time axis is divided into units each of a predetermined time length. These units are termed time frames T1 to T4. The time length of each time frame is preferably set to 11.6 msec. The line segment indicating the frequency axis is divided into 16 frequency domains. For convenience in explanation, the respective frequency domains are represented by their respective center frequencies f0 to f15.
The manner in which the four blocks T1 to T4 for the frequency domain f8, that is, four blocks B1 to B4, are processed with block floating, is hereinafter explained.
Meanwhile, in a system in which input audio signals are compressed by the above-mentioned block floating, a phenomenon known as pre-echo tends to be produced.
The present Assignee has already proposed means for remedying the pre-echo in our co-pending U.S. patent applications Ser. No. 07/553,608 filed on Jul. 18, 1990 and Ser. No. 07/664,300 filed on Mar. 4, 1991 and U.S. Pat. No. 5,115,240. Reference to the pre-echo has also been made in Edler, "Coding of Audio Signals with Overlapping Block Transform and Adaptive Window Functions", Frequent, Vol. 43, No. 9, 1989, pages 252 to 256.
This pre-echo, which presents serious problems when decoding and on the sound quality of the reproduced sound, is briefly explained.
For example, if impulse signals, that is signals undergoing acute rise in signal level, are present in a time frame for which block floating is performed, the quantization noise is produced substantially uniformly within the time frame. The result is that the quantization noise present in the low signal level portions is heard in the absence of the masking effect as later explained. This phenomenon is the above-mentioned pre-echo. Occasionally, the pre-echo means the quantization noise produced in the low signal level portions.
The case will now be explained where audio signals waving an acutely rising signal level are processed with block floating for a predetermined time frame, herein each of time frames T1 to T4, as a unit, and subsequently decoded, that is, compressed audio signals are compressed and subsequently expanded, as shown in FIG. 9.
In such a case, the quantization noise present in the low signal level portion within a time frame T2 in which there are signals having an acutely rising signal level, that is the quantization noises present in the early half of the time frame T2, are perceived as pre-echo (pe).
The following is thought to account for the occurrence of such pre-echo.
Pre-echo is a phenomenon occurring in high efficiency encoding in which input audio signals are subdivided into blocks and processed, with block floating, from bloc to block and data in each block is quantized in accordance with adaptive bit allocation.
That is, for each of the blocks B1 to B4, input signal energies E, specifically, signal energies E1(1) to E4(4) for the blocks B1 to B4, are found, as shown for example in FIG. 11. The allowable noise energies P, specifically the allowable noise energies P1(1) to P4(4) which take into account the masking effect from block to block, are found based on these energies P1(1) to P4(4).
The word lengths corresponding to the numbers of allocated bits, that is word lengths W1(1) to W4(4), are then found from the allowable noise energies P1(1) to P4(4) and floating coefficients (scaling factors S1(1) to S4(4)) for the blocks B1 to B4.
It is noted that the floating coefficients or scaling factors S are found by multiplying a peak value or an average value of the block-by-block spectrum signal with a predetermined coefficient.
On the other hand, the word length W corresponding to the numbers of allocated bits is found on the basis of the allowable noise energies P associated with the energies E of the block-by-block spectral signals.
Referring to FIG. 11, since the signal energies in the latter half of the block B2 within time frame T2 are increased, as shown in FIG. 9, the signal energies E2(2) within the block B2 and the allowable noise energies P2(2) associated with the signal energies E2(2) are increased. So the noise level masked depending on the signal energies E2(2), are also increased. Consequently, the number of bits allocated to the block B2 for quantization of the spectral signals for block B2 corresponds to the word length W2(2). Therefore, only the number of bits sufficient to lower the quantization noise so as to be lower than the allowable noise energies P2(2) is allocated to the block B2.
However, the signal level is low for the first half of the time frame T2 for the block B2, as shown in FIG. 9. Therefore, in effect, the allowable noise energies for the first half of block B2 shown as sub-block a block B21 resulting from division of the block B2 into two equal parts or sub-blocks B21 and B22, should be of a low value, as shown in FIG. 12.
On the other hand, since the signal level of the latter half of the time frame T2 is increased acutely, that is the signal produced in the latter half of the time frame T2 is a transient signal, as shown in FIG. 9, the allowable noise energies P2(2)2 of the latter half sub-block B22 should be of a higher value.
Meanwhile, for assisting in understanding, signal energies E2(2)1, E2(2)2 for sub-blocks B21, B22 for the time frame T2 are also shown in association with the signals shown in FIG. 9.
In light of the above, if the number of bit allocation is determined as shown in FIG. 11, the quantization noise in excess of the allowable noise energies P2(2)2 is present in the first half of the time frame T2 in FIG. 12, that is the sub-block B21, and it is perceived as pre-echo.
Meanwhile, for preventing the occurrence of the pre-echo, it is effective to diminish the time frame to as small a size as possible by a method consisting in diminishing the pre-echo time to the least value possible for rendering the pre-echo to be imperceptible by taking advantage of a so-called backward masking in which a temporary preceding sound is masked by a temporally succeeding impulse sound.
However, there is a certain limit to the reduction of the time frame length because too short a length of the time frame leads to worsened coding efficiency.
There is also known a method in which a time frame in which a signal having an acutely rising signal level is detected and an excess number of bits is allocated to the time frame to reduce the quantization noise.
However, it is difficult with this method to determine accurately what is the number of bit allocation sufficient to lower the pro-echo to a practically imperceptible level.
The present Assignee has proposed in U.S. patent application Ser. No. 07/553,608, which was already issued, a method for rendering the time length of the time frame variable and reducing the length of the time frame in which the signal level is increased acutely.
However, since block floating is carried out on spectral signals obtained by transform processing of the time domain audio signals of the time frame into signals on the frequency axis, it is difficult for too short three frames to co-exist in view of the window shape used to find the spectrum from the time domain signals. Consequently, it is rather difficult to prevent the pre-echo solely by the method of reducing the time frame.