1. Field of the Invention
The present invention relates to an apparatus, method, and computer program product for encoding an audio signal, and more particularly, to an apparatus, method, and computer program product for encoding an audio signal by means of time-frequency transform in accordance with the Moving Picture Experts Group audio standard.
2. Description of the Related Art
There have so far been proposed a wide variety of audio signal encoding methods such as an entropy encoding method for encoding an audio signal in accordance with statistics related to the audio signal to be compressed, and a perceptual encoding method for encoding an audio signal in accordance with human perceptual characteristics. The MPEG audio standard aggressively adopts the perceptual encoding method, which, for example, performs compression to remove audio signal components not audible by the human ear due to the masking effect or below the minimum audible threshold.
Such an encoding method comprises the steps of (1) inputting an audio signal consisting of a plurality of audio signal components, and (2) assigning a predetermined value to each of the audio signal components in accordance with the sampling frequency or frame length (long-length frame or short-length frame). An audio signal encoding method, for example, conforming to MPEG-2 Advanced Audio Coding (AAC) further comprises the step of assigning a predetermined value to each of the audio signal components in accordance with a scale factor band table shown in FIG. 18. The scale factor band table shown in FIG. 18 includes a plurality of maximum scale factor bands to be allocated to respective frequencies, i.e., audio signal components of the audio signal with respect to a short-length frame and a long-length frame.
One of the conventional audio signal encoding apparatus is shown in FIG. 19 as comprising inputting means a3, FFT analyzing means 300, Psychoacoustic model analyzing means 330, frame length determining means 310, coded mode information inputting means 320, maximum scale factor band calculation means 340, maximum scale factor band table storage means 350, spectral processing means 360, and quantizing and encoding means 370. In the drawings, “maxSfb” is intended to mean “maximum scale factor band”, “smr” is intended to mean “Signal-to-Mask ratio”.
The inputting means a3 is operative to input the audio signal therein. The FFT analyzing means 300 is operative to perform the fast Fourier transform to the audio signal inputted from the inputting means a3 to generate frequency information about the audio signal. The frame length determining means 310 is operative to judge whether the audio signal inputted from the inputting means a3 is transient or stationary. This means that the frame length determining means 310 is operative to determine a short-length frame for the audio signal when it is judged that the audio signal is transient and a long-length frame for the audio signal when it is judged that the audio signal is stationary.
The coded mode information inputting means 320 is operative to input coded mode information. The psychoacoustic model analyzing means 330 is operative to calculate Signal-to-Mask ratio information for the audio signal on the basis of the frequency information about the audio signal generated by the FFT analyzing means 300, in accordance with a predetermined psychoacoustic model. The maximum scale factor band table storage means 350 is operative to store initial maximum scale factor band information. The initial maximum scale factor band information includes a plurality of predetermined maximum scale factor bands each fixedly corresponding to the coded mode information such as a bit rate and a sampling frequency and the frame length in one-to-one relationship.
The maximum scale factor band calculation means 340 is operative to calculate a maximum scale factor band for the audio signal on the basis of the result made by the frame length determining means 310 and the coded mode information inputted from the coded mode information means 320 with reference to the initial maximum scale factor band information stored in the maximum scale factor band table storage means 350.
The spectral processing means 360 is operative to divide the audio signal inputted from the inputting means a3 into a plurality of audio signal components each corresponding to a scale factor band, and to perform spectral processing to the audio signal components up to an audio signal component corresponding to the maximum scale factor band calculated by the maximum scale factor band calculation means 340, on the basis of the Signal-to-Mask ratio information calculated by the psychoacoustic model analyzing means 330 to generate audio signal data. The spectral processing performed by the spectral processing means 360 includes Modified Discrete Cosine Transform (hereinlater referred to as “MDCT”) processing and Temporal Noise Shaping (hereinlater referred to as “TNS”) processing. The quantizing and encoding means 370 is operative to quantize and encode the audio signal data generated by the spectral processing means 340 to generate a coded audio signal to be outputted therethrough.
In the above conventional audio signal encoding apparatus, the maximum scale factor band calculation means 340 calculates a maximum scale factor band by selecting a maximum scale factor band for the audio signal from among the fixedly predetermined maximum scale factor bands stored in the maximum scale factor band table storage means 350 on the basis of the frame length and the coded mode information about the audio signal. The initial maximum scale factor band information includes a plurality of predetermined maximum scale factor bands each fixedly corresponding to the coded mode information such as a bit rate and a sampling frequency and the frame length in one-to-one relationship while, on the other hand, audio signals inputted therein are different one after another. This means that the maximum scale factor band calculation means 340 calculates a maximum scale factor band on the basis of the coded mode information such as the frame length and the coded mode information regardless of the characteristics of the audio signal, for example, whether the audio signal is biased to any frequency range or not. The spectral processing means 360 and the quantizing and encoding means 370, then, performs the spectral processing to, and quantize and encode the audio signal up to a audio signal component corresponding to the maximum scale factor band thus calculated, regardless of whether the audio signal is biased to any frequency range or not.
As will be understood from the previously mentioned fact, the conventional audio signal encoding apparatus of this type encounters such a drawback that the conventional audio signal encoding apparatus may unnecessarily perform the spectral processing to, and quantize and encode all the audio signal components of the audio signal including audio signal components not audible by the human ear especially when the audio signal is biased to, for example, a low-frequency range, thereby making it difficult to efficiently perform the spectral processing to, and quantize and encode the audio signal and enhance the quality of the audio signal.
The present invention is made with a view to overcoming the previously mentioned drawback inherent to the conventional audio signal encoding apparatus.