1. Field of the Invention
The system and method described herein relate to enhanced efficiency during audio encoding and transcoding.
2. Discussion of the Related Art
High quality audio compression is normally carried out using perceptual models of the human auditory system (i.e., psycho-acoustic models). An auditory system is often modeled as a filter bank that decomposes an audio signal into banks referred to as critical bands. A critical band consists of one or more audio frequency components that are treated as a single entity. Some audio frequency components can mask other components within a critical band (i.e., intra-masking) and components from other critical bands (i.e., inter-masking). Though the human auditory system is highly complex, models thereof have been successfully used to achieve high quality compression.
A perceptual audio encoder attempts to achieve transparent compression (i.e., decompressed audio perceptually equal to the original audio) by using a psycho-acoustic model, and by maintaining quantization noise just below the level at which it later becomes audible to a listener (FIG. 2). Perceptual audio coding is the basis for such compression algorithms as Motion Pictures Experts Group (“MPEG”)-1 Layer 3 (“MP3”) and advanced audio coding (“AAC”).
Many algorithms that model the human auditory system have been proposed. By way of example, the MPEG standard specifies two different psycho-acoustic model versions; dubbed Versions 1 and 2. Though a number of algorithms are commonly implemented, the basic methodology generally remains the same: (1) decompose an audio input signal into a spectral domain (Fast Fourier Transform, or “FFT,” being the most widely used tool for this operation); (2) group spectral bands into critical bands (in MPEG algorithms, this entails mapping from FFT samples to M critical bands); (3) determine tonal and non-tonal (i.e., noise-like) components within the critical bands; (4) calculate the individual masking thresholds for each of the critical band components by using the energy levels, tonality, and frequency positions; and (5) compute a distortion threshold (sometimes referred to as a masking threshold).
Perceptual audio encoders, such as MP3 and AAC, rely on complex mathematical models of the auditory system to implement the methodology described above; the complexity owing at least in part to efforts to minimize the perception of quantization errors in the signal. To that end, these encoders as well as other conventional applications generally employ FFT operations that are CPU-intensive, requiring the execution of numerous CPU cycles for completion. Because many CPU cycles must be delegated to such operations, there may be correspondingly fewer CPU cycles available to other applications or operations in a computing or similar system while performing a coding operation on an audio stream. Such large system demands may decrease overall efficiency.
Accordingly, there is a need for a system and method for efficiently achieving perceptual audio coding and transcoding that does not require the utilization of complex psycho-acoustic models during an encoding operation.