Technologies for compressing a digital audio signal are known in which the digital audio input signal is orthogonally transformed, the resulting spectral coefficients are divided in frequency into bands to which block floating processing is applied. The spectral coefficients in each band are then quantized to provide a compressed signal for recording on a recording medium, for transmission, or for distribution. It is also known to include block floating information and quantizing information in the compressed signal together with the quantized spectral coefficients. In the following description, whenever the words "recording" and "recording medium" are used, it will be understood that these terms encompass transmitting and distribution, and transmission and distribution media.
The above-mentioned block floating technique involves multiplying the spectral coefficients in each band by a common value to increase their values. This enables the accuracy of the quantizing to be improved. For example, there is the block floating technique in which the maximum of the absolute values (i.e., the maximum absolute value) of the spectral coefficients in the band is identified, and block floating processing is applied to all the spectral coefficients in the band using a common block floating coefficient, so that no value greater than the maximum absolute value is produced. This prevents a numerical overflow in the processing apparatus. Block floating using bit shifting is a simpler block floating technique, but has a maximum resolution of 6 dB.
In conventional orthogonal transform processing, without use of block floating, sufficient accuracy can be ensured at all input levels by using an operational word length that sufficiently long to prevent the word length accuracy of the signal input to the orthogonal transform circuit from being damaged by the orthogonal transform processing.
An additional approach is employed to improve the analysis accuracy of the orthogonal transform in which the block size to which the orthogonal transform processing is applied is made variable, depending upon the temporal properties, i.e., dynamics, of the signal. In this approach, the root mean square values of the differences between adjacent samples of the signal may be used, for example, as a judgment index for determining the block size.
In the orthogonal transform processing operation, the operational word length must be increased to provide the required degree of accuracy, To accommodate such increased word lengths, the scale of the hardware must be made large, resulting in increased costs of manufacture. Also, when the block size to which the orthogonal transform processing is applied is made variable, the need to determine a judgment index solely for this purpose results in an increase in the number of processing steps.
Further, determining the maximum absolute value in the block floating processing requires processing steps to determine whether or not, for each spectral coefficient in the block, the absolute value of the current spectral coefficient is larger than the maximum absolute value of the spectral coefficients already processed. This requires a large number of processing steps, which require a large amount of time to execute.
In view of facts described above, there has been proposed a technique for compressing and expanding a digital audio input signal in which filters are used to divide the digital audio input signal into a frequency range signal in each of plural frequency ranges. Each frequency range signal is divided in time into blocks of plural samples. First block floating is applied to each block of samples, and orthogonal transform processing is applied to each processed block. The resulting spectral coefficients are grouped by frequency into bands, and second block floating processing is applied to the spectral coefficients in each band. The bands of block-floating processed spectral coefficients are then quantized to provide a compressed signal for recording, transmission, or distribution. The processing just described prevents degradation of the accuracy of the orthogonal transform operation.
In the expander complementary to the just-described compressor, after the quantizing and the second block floating of spectral coefficients in the compressed signal are released, the spectral coefficients are subject to inverse orthogonal transform processing. The first block floating of the blocks of samples in the time domain resulting from the inverse orthogonal transform processing is released, and the resulting plural frequency range signals are synthesized using suitable inverse filters to provide the digital audio output signal.
In the technique just described, since the magnitude of the first block floating is determined by taking the maximum value of the samples or the absolute values of the samples in each block subject to block floating, in a frequency range in which the frequency range signal has a relatively large value, applying block floating results in relatively large processing errors, which manifest themselves as audible noise. The noise resulting from the processing errors is audible because audio signals generally have signal components towards lower frequencies with amplitudes larger than signal components towards higher frequencies. Consequently, there are many instances where the noise level is greater towards lower frequencies than towards higher frequencies.
However, the human sense of hearing has a greater sensitivity to noise towards lower frequencies. Therefore, there are instances where a greater noise level towards lower frequencies may be problematical because it is subjectively more noticeable. While noise towards lower frequencies may not be noticed because of the masking effect of the human sense of hearing, masking cannot be relied on to render noise towards lower frequencies inaudible in the presence of all audio signals.
For example, the two audio signals shown in FIGS. 12(a) and 12(b) have the same maximum signal level, which gives rise to the same noise level due to operational errors. This noise will be heard in the presence of the audio signal shown in FIG. 12(a) because this signal is relatively tonal, and, consequently, has a relatively narrow masking range. The audio signal shown in FIG. 12(b) has a relatively broad spectrum, which effectively masks the noise. Attempts to reduce quantizing noise by increasing the operational word length, or by carrying out the processing with double precision, etc., results in an increase in the scale, and hence the cost, of the hardware required.