1. Field of The Invention
This invention relates to the efficient information coding of digital audio signals for transmission or digital storage media especially a quantization method using an efficient bit allocation method for information coding
2. Related art of The Invention
Compression algorithms for wide band audio signals are finding wide-spread applications in recent years. In the area of consumer electronics, the digital compact cassette (DCC) and the mini-disc system (MD) are two applications of these compression algorithms.
While the DCC system uses a compression algorithm that is based on sub-band coding, the MD system uses an algorithm that is based on a hybrid of sub-band and transform coding with the transform coding portion forming the backbone of the algorithm. This invention is related to the dynamic bit allocation of the MD coder.
The MD system uses the ATRAC compression algorithm that is documented in chapter 10 of the MD system description by Sony in December 1991. The ATRAC algorithm compresses the input audio signals at a bit rate of 705.6 k bit/s/channel to a bit rate of 146.08 k bit/s/channel.
FIG. 8 shows the block diagram of the encoding process. The input time signals are first passed through a splitting filter bank, 1, 2, 3, to obtain the signals in three frequency bands. The lower two bands are each at half the bandwidth of the uppermost band. Block size decision, 4, is made for each band to determine the sample size or block mode for the windowing and transform process, 5, 6, 7. One of the two block modes availablexe2x80x94short block mode or long block mode, will be selected for each of the bands. The transformed spectral samples are grouped into units and in each unit, a scale factor is derived from the peak values of the samples in the unit, 8. These units are non-uniform frequency intervals with a finer resolution in the low frequencies and coarser resolution in the higher frequencies. Quantization, 10, is carried out on the samples using the scale factor and bit allocation information from the dynamic bit allocation module, 9.
The dynamic bit allocation method forms an integral part in any adaptive compression algorithm. The quality of the reconstructed output and the extent of redundancy and irrelevancy removal are largely determined by the bit allocation method. In addition, the bit allocation procedure also plays a part in determining the degree of hardware complexity. As the bit allocation in the ATRAC algorithm is mainly applied to the transformed spectral samples, numerous example of prier art exist for this type of transform coder. These dynamic bit allocation techniques can be grouped largely into two categories.
The first category consists of bit allocation methods applying the psychoacoustic phenomena of simultaneous masking and threshold in quiet to derive the masking threshold. Examples in this category include the bit allocation in the MD system description, where the MDCT spectral samples are used to compute the masking threshold. A more complicated technique is described in the paper entitled xe2x80x98Transform Coding of Audio Signals using perceptual noise criteriaxe2x80x99 by J. D. Johnston, where a Fast Fourier Transform (FFT) is used to obtain the frequency spectral components for more complex masking calculations.
The second category offers a more simplified method of allocating according to the signal statistics. An example is the optimum bit allocation procedure described in xe2x80x98Digital Coding of Waveformsxe2x80x99 by N. S. Jayant and P. No 11, which allocates the bits to the different spectral components by minimizing the reconstructed error against a constant bit rate.
Meanwhile the object of the dynamic bit allocation procedure is to remove redundancy and irrelevancy while maintaining the original audio quality. To enable the applications of this algorithm in consumer products, which in general are low in cost, the algorithm has to have low complexity. The greatest difficulty in the design of a dynamic bit allocation procedure lies in balancing good audio quality while at the same time, keeping the procedure as simple as possible. The bit allocation procedures described under category one in the previous section are able to produce good sound quality. However, a great deal of complexity is incurred. Most of these algorithms require full DSP (either fixed point or floating point in some cases) power in order to perform the bit allocation.
On the other hand, the optimum bit allocation procedure mentioned above, while less complex, is unable to attain the quality of the perceptual based bit allocation as its design is concentrated purely on the source which is the audio signal and not on the final receiver, the human auditory system.
Further, allocation procedure requires a large number of iterative loops in order to use up the channel bit rate optimally. This problem was highlighted by the authors in the xe2x80x98Digital Coding of Waveformsxe2x80x99. Therefore this bit allocation algorithm is unsuitable in the case where there is a great constraint on the execution time of the DSP. To compound the problem of intensive iteration is the fact that in the MD coding algorithm, a non-uniform number of spectral samples are grouped in each unit, therefore making it difficult to compute the number of bits that will be used up without a bit by bit allocation.
The object of this invention is to design a dynamic bit allocation procedure that is adapted to the human auditory system and yet with a low level of complexity so that high audio quality can be achieved while at the same time meeting the low cost target.
For the purpose of achieving the above object, the dynamic bit allocation comprises the means of obtaining the variance or a representative within a defined frequency interval as an accurate representation of the signals in the interval; the means of determining the necessary bandwidth of the audio signal using the human hearing threshold so that irrelevancy is removed; the means of determining the initial quantizations using an approximate mathematical model which considers the in-band masking effect of the signal; the means of increasing the quantizations iteratively so as to meet the final bit rate required while maintaining the affect of the mathematical model.
The bit allocation method has a very important role for determining the quality of the reconstructed output and the complexity of hardware. The present invention realizes a dynamic bit allocation method which produces low hardware complexity and high quality audio, by combining psychoacoustic criteria and human hearing threshold. The low complexity hardware is realized by allocating an initial bit number using the approximations made in the mathematical model before the remaining bit number is allocated by a precise iterative process.
That is in the present invention, the variance or representative used in the bit allocation procedure considers the dynamic behavior of the signals and adjusts to it while the bandwidth computation considers the human auditory system and adapts the bit allocation to it. The mathematical model also adapts to the human auditory system by considering masking while at the same time serving to reduce the number of computational steps or loops required for the bit allocation. This means further simplifying the procedure by approximating the dynamic elements within the relation with the statistically derived constants. The iterative means ensure that the desired bit rate is met while adjusting for the approximations made in the mathematical model.
Meanwhile the meaning of masking threshold is described as follows. FIG. 5 shows an example of the masking threshold. In the figure xe2x80x98axe2x80x99 indicates the least audible value and the area above the line is the human audible area. And xe2x80x98bxe2x80x99 and xe2x80x98cxe2x80x99 shows spectrum of strong signal component and the hatched parts are the such portion masked by signals xe2x80x98bxe2x80x99 and xe2x80x98cxe2x80x99. The curve xe2x80x98dxe2x80x99 is a threshold value of masking obtained from the audible area and the hatched parts masked by xe2x80x98bxe2x80x99, xe2x80x98cxe2x80x99 signals. Since the signal below the threshold value xe2x80x98dxe2x80x99 can not be sensed, the quantization is not necessary. Since the masking threshold is high at signal xe2x80x98bxe2x80x99, a higher quantization noise near this frequency will still remain inaudible.