The present invention generally relates to an audio coding scheme and, more specifically, to an improved audio coding scheme that is based on an excitation pattern.
Transmitting audio signals emanating from an audio source in their original form requires a not insignificant amount of computing resources. Furthermore, portions of audio signals are beyond human detection and thus their transmission is wasteful. Consequently, audio signals are typically compressed before they are transmitted. There are usually two approaches to compress audio signals for use in applications such as communications, audio broadcasting and storage systems.
One approach utilizes the redundant nature of audio signals in time-domain and frequency-domain. This approach is used in a number of schemes including, for example, linear prediction schemes and discrete Fourier transform based schemes.
Another approach uses perceptual coding where signal processing characteristics of auditory systems are used to remove data that are irrelevant or inaudible to the auditory systems. One common audio phenomenon that is exploited in current perceptual audio technologies, such as, standard audio codecs AAC or AC3 in DVD, HDTV and digital audio broadcasting, is the masking effect. Masking effect occurs when a fainter but otherwise distinctly audible signal becomes inaudible when a louder signal appears simultaneously. In other words, the fainter signal is masked by the louder signal. The fainter signal is called as the maskee and the louder signal is called as the masker. Masking effect depends on the spectral composition of both the masker and the maskee. One characteristic associated with the masking effect is the masked threshold. All signals under the masked threshold are in effect inaudible and hence can be neglected (or effectively considered to be zero) in audio codecs. FIG. 1 illustrates a typical masking-effect-based audio encoder. This audio encodes includes a number of components which respectively perform the following functions: (1) window-processing; (2) transforming the signal to frequency domain by performing fast Fourier transform or some other orthogonal transforms such as the discrete cosine transform or wavelet transforms; (3) calculating the masked threshold according to rules known from psychoacoustics and the spectrum obtained in (2); (4) performing bit-allocation processing to allocate different bits for different frequency bins according to their magnitudes and the masked threshold, (for example, for all frequency bins whose magnitude are less than the masked threshold, the allocated bit is zero); (5) coding all frequencies with different bits based on the bit allocation calculation; and (6) performing bitstream packing to assemble the bitstream and some additional information, such as, bit allocation information. The foregoing functions of these various components in the masking-effect-based audio encoder are well understood by a person of ordinary skill in the art.
In addition, the audio encoder shown in FIG. 1 can be simplified to create a transform-based encoder. FIG. 2 illustrates a typical transform-based encoder. The transform-based encoder uses a source coding scheme (frequency domain transform source coding scheme). The transform-based encoder is similar to the audio encoder shown in FIG. 1 except that all components related to the masking effect are not included.
Although these available coding techniques can satisfy the bit rate requirements in many applications, further audio compression is still highly desirable in very low bit rate applications. As a matter of fact, in addition to the masking effect, other characteristics of human auditory systems could be employed to achieve the goal of further reducing bit rate.
Hence, it would be desirable to have a method and system that is capable of providing audio compression in a more efficient manner.