The standardized body, Motion Picture Experts Group (MPEG), discloses conventional data compression methods in their standards such as, for example, the MPEG-2 advanced audio coding (AAC) standard (see ISO/IEC 13818-7) and the MPEG-4 AAC standard (see ISO/IEC 14496-3). These standards are collectively referred to herein as the MPEG standard.
An audio encoder defined by the MPEG standard receives an audio signal, converts it through a modified discrete cosine transform (MDCT) operation into frequency spectral data, and determines optimal scale factors for quanitizing the frequency spectral data using a rate-distortion control mechanism. The audio encoder further quantizes the frequency spectral data using the optimal scale factors, groups the resulting quantized spectral coefficients into scalefactor bands, and then subjects the grouped quantized coefficients to Huffman encoding.
According to the MPEG standard, MDCT is performed on the audio signal in such a way that that adjacent transformation ranges are overlapped by 50% along the time axis to suppress distortion developing at a boundary portion between adjacent transformation ranges. In addition, the audio signal is mapped into the frequency domain using either a long transformation range (defined by a long window) or short transformation ranges (each defined by a short window). The long window includes 2048 samples and the short window includes 256 samples. The number of MDCT coefficients generated from the long window is 1024, and the number of MDCT coefficients generated from each short window is 128. Generally, for a steady portion in which variation in signal waveform is insignificant, the long window type needs to be used. For an attack portion in which variation in signal waveform is violent, the short window type needs to be used. Which thereof is used is important. If the long window type is used for a transient signal, noise called pre-echo develops preceding an attack portion. When the short window type is used for a steady signal, suitable bit allocation is not performed due to lack of resolution in the frequency domain, the coding efficiency decreases, and noise develops, too. Such drawbacks are especially noticeable for a low-frequency sound.
According to the method proposed by the MPEG standard, the determination of the window type for a frame of spectral data begins with performing Fast Fourier Transform (FFT) on the time-domain audio data and calculating FFT coefficients. The FFT coefficients are then used to calculate the audio signal intensity for each scalefactor band within the frame. Also psychoacoustic modeling is used to determine an allowable distortion level for the frame. The allowable distortion level indicates the maximum amount of noise that can be injected into the spectral data without becoming audible. Based on the allowable distortion level and the audio signal intensity of each scalefactor band within the frame, perceptual entropy is computed. If the perceptual entropy is larger than a predetermined constant, the short window type is used for the frame. Otherwise, a long window type is used for the frame.
The above method of making a window type decision takes a large amount of computation. In addition, the resultant value of the perceptual entropy can be high if the signal strength is high whether the signal is transient or steady. That is, a frame may be assigned a short window type even if the frame is not in the transition. As discussed above, this will cause a decrease in the coding efficiency and the development of noise.
Further, if a decision is made to use a short window type, 8 successive blocks (short windows) of MDCT coefficients are generated. To reduce the amount of side information associated with short windows, the short windows may be grouped. Each group includes one or more successive short windows, the scalefactor for which is the same. However, when grouping is not performed appropriately, an increase in the number of codes or degradation of the sound quality occur. When the number of groups is too large with respect to the number of short windows, the scalefactors which otherwise can be coded in common will be coded repeatedly, and, thereby, the coding efficiency decreases. When the number of groups is too small with respect to the number of short windows, common scalefactors are used even when variation of the audio signal is violent. As a result, the sound quality is degraded. The MPEG standard does not provide any specific methods for grouping short windows.