1. Field of The Invention
This invention relates to the efficient information coding of digital audio signals for transmission or digital storage media.
2. Related Art of The Invention
Compression algorithms for wide band audio signals are finding wide-spread applications in recent years. In the area of consumer electronics, the digital compact cassette (DCC) and the mini-disc system (MD) are two applications of these compression algorithms. While the DCC system uses a compression algorithm that is based on sub-band coding, the MD system uses an algorithm that is based on a hybrid of sub-band and transform coding with the transform coding portion forming the backbone of the algorithm. This invention is related to the determination of the block size of the transform coder used in the MD system.
The MD system uses the ATRAC compression algorithm which is documented in chapter 10 of the MD system description by Sony in December 1991. The ATRAC algorithm compresses the input audio signals at a bit rate of 705.6 kbit/s/channel to a bit rate of 146.08 kbit/s/channel.
FIG. 3 shows the block diagram of the encoding process. The input time signals are first passed through a splitting filter bank, 1, 2, 3, to obtain the signals in three frequency bands. The lower two bands are each at half the bandwidth of the uppermost band. Block size decision, 4, is made for each band to determine the sample size or block mode for the windowing and transform process, 5, 6, 7. One of the two block modes available,--short block mode or long block mode, will be selected for each of the bands. The transformed spectral samples are grouped into units and in each unit, a scale factor is derived from the peak values of the samples in the unit, 8. Quantization, 10, is carried out on the samples using the scale factor and bit allocation information from the dynamic bit allocation module, 9. Now the unit is described as follows.
The audio signal is transformed into, for example, 256 spectral lines by MDCT. The spectral lines are grouped into plural units. In FIG. 4 the 256 pieces of spectral lines are grouped into 26 units numbered 0-25. The bit allocation data WL(u) is given by unit as shown in FIG. 5. In the example of FIGS. 4 and 5, the spectrum of 1-8 has same bit allocation data WL(0) and the spectral line numbers 9-12 have the same bit allocation data WL(1).
The block size decision plays an important role in transform coders to improve its time versus frequency performance. In the event of an attack (or a sharp rise of signal energy) in the signal, the singularities in the time signals need to be reflected in the coding or a pre-echo condition would occur, resulting in noise preceding the audio signal.
The block size decision method proposed in the MD system description is shown in FIG. 6. Step of numeral 1 is the peak value detection block. The adjacent peak values are compared in step of numeral 2, and step of numeral 3 contains the decision switch (threshold) to be compared. In step of numeral 4, mode 1, the short block mode is selected while in step of numeral 5, mode 4 or mode 3 is selected for the high or middle/low band respectively.
Meanwhile good block size decision is needed to reduce the pre-echo condition described earlier. The design of the block size decision hinges upon 2 important considerations.
Firstly, it is desirable to make the block length as long as possible because of greater frequency resolution. From real signal statistics and the physiology of the human ear in perceiving sounds, greater redundancy and irrelevancy can be removed with greater frequency resolution. Secondly, a conflicting condition to the first exists in that the audio signals are time-domain signals and some important cues need to be faithfully reproduced in time. In a transform coder, the block size selected determines the time versus frequency resolution. It is therefore important to detect the need to change the block size correctly for the optimum performance of the transform coder.
The prior art does not accurately reflect the proper design of the block size decision block. The block showing the process of `compare adjacent peak values` in FIG. 2 can have the following 2 interpretations. Firstly, the procedure could be interpreted to mean comparing the two adjacent peak values, one from the previous time block and one from the next time block or secondly, it could mean comparing only one adjacent peak value namely, the one from the previous time block.