Digital audio transmission generally requires a considerable amount of memory and bandwidth. To achieve an efficient transmission, signal compression needs to be employed. Efficient coding systems are those that could optimally eliminate irrelevant and redundant parts of an audio stream. The first is achieved by reducing psycho acoustical irrelevancy through psychoacoustics analysis. The second is through modeling of the signal using a set of functions or through a prediction tool.
There are basically two different coding approaches for compression purpose: transform coding and parametric coding. Transform coders generally use the signal's frequency domain representations and perform psychoacoustics analysis to allocate the quantization noise below the noticeable level of human auditory systems. Parametric coder on the other hand, decomposes signals into parameterized components. Only these parameters are subsequently coded. Transform coders generally operate at much higher bit rates and have a higher quality than parametric coder. Some examples of conventional transform coders include Movie Picture Experts Group (MPEG) layer 1 to layer 3, MPEG-Advanced Audio Coding (AAC), etc., all of which require an operating rate around 128 kbps for good stereo quality. Parametric coders typically have an operating bit rate below 32 kbps. An example of a parametric coder is a MPEG-HILN coder.
Conventional high quality encoding efforts typically combine the two approaches above which results in a hybrid coder. One example is enhanced AAC plus (eAAC+) which combines a transform coder (AAC) with parameterized high frequency components (also known as Spectral Band Replication (SBR)) and a parametric stereo (PS) coder. A set of spatial parameters is firstly extracted from a stereo stream. After which, a stereo to mono down-mix is performed, and the mono stream is passed to the core transform coder. In the case of enhanced AAC plus, further parameterization is done to represent the high frequency component of this mono stream, and only the lower half of the mono streams is processed by the core transform coder. Without the parametric stereo portion, the scheme is called AAC plus. MPEG Audio Layer III (MP3) pro uses a similar scheme with MP3 as the core transform coder.
Transform coders rely on the fact that audio signals are stationary most of the time. There is generally an inherent artifact related to the presence of a transient called pre-echo, which refers to the spreading of quantization noise over the window length. To remedy this, most if not all transform coders come with a transient detection mechanism to determine the need to use shorter window length. Parametric coders also need similar detection mechanism to determine how often the parameter needs to be updated.
Transform and parametric coder were developed independently. Even after their union as a hybrid coder, there is no information being passed among them besides the Pulse Code Modulation (PCM) input data. The earlier explanation suggests that there is a redundant transient detection mechanism in a hybrid coder. This fact has systematically been exploited in conventional systems where inside an eAAC+ hybrid coder, the transient detection results from a parametric stereo portion are forwarded to the SBR and core AAC coder.
FIG. 1 generally illustrates the general structure of a conventional eAAC+ encoder 100 comprising an enhanced SBR encoder 102, an AAC encoder 104, and a bitstream payload formatter 106. The scheme works well because basically each of the modules is operating on the same signal. The difference is that the PS works on the original stereo signal, SBR works on the down-mixed monaural signal, and AAC works on the band limited monaural signal. The synchronization between the three modules makes it advantageous to put the transient detection inside the PS module not only because the PS module is operated first, but also since the analysis at this module contains the most complete version of the input signal. Furthermore, this detection was made as part of the parameter extraction, hence giving very little computational burden.
Encoders such as eAAC+ and MP3pro encoders combine the parameterization of the stereo component and the high frequency portion of the signal with an advanced transform coder operating only for one channel at half bandwidth. Despite the good compression ratio achieved, these coders typically have a very high complexity which is not suitable for application running on limited computational power.