Currently there is an absence of an efficient coding scheme for the high-frequency range within low bit-rate audio signals. Specifically, in existing audio coding schemes, such as MPEG-4 advanced audio coding (AAC), a full-band audio signal is encoded using a quantizing and coding method. However, when bandwidth is limited and a low bit-rate audio coding scheme is used, then it is sub-band audio signals that generally are encoded because of the dearth of available bits. As a result, the high frequency (HF) subbands (or components) of the audio signal often are encoded with fewer bits or completely removed to satisfy bit constraints. This lack of bits due to a reduced available bandwidth typically reduces the quality of the encoded audio signal.
The HF component of the audio signal may be encoded by detecting an envelope of a spectrum rather than a fine structure of the signal. Accordingly, in the MPEG-4 advanced audio coding (AAC) algorithm, an HF component having a strong noise component is encoded using a perceptual noise substitution (PNS) tool. For PNS encoding, an encoder detects an envelope of noise from the HF component and a decoder inserts random noise into the HF component and restores the high frequency component.
The HF component including stationary random noise can be efficiently encoded using the PNS tool. However, if the HF component includes transient noise and is encoded by the PNS tool, then a metallic noise or buzzing noise occurs. The MPEG-4 high efficiency (HE) AAC algorithm attempts to solve this problem by encoding the HF component using a spectral band replication (SBR) tool. Spectral band replication (SBR) enhances audio or speech codecs (especially at low bit-rates) based on harmonic redundancy in the frequency domain. It also can be combined with any audio compression codec. The codec itself transmits the lower and mid-frequencies of the spectrum, while SBR replicates higher frequency content by transposing up harmonics from the lower and mid-frequencies at the decoder.
Some guidance information for reconstruction of the high-frequency spectral envelope is transmitted as side information. Noise-like information is adaptively mixed in selected frequency bands in order to faithfully replicate signals that originally contained none or less tonal components. The SBR technique is based on the principle that the psychoacoustic part of the human brain tends to analyze higher frequencies with less accuracy. Thus, harmonic phenomena associated with the spectral band replication process needs only be accurate in a perceptual sense and not technically or mathematically exact.
Because the SBR technique uses a quadrature mirror filter (QMF), then a modified discrete cosine transform (MDCT) output is subjected to the QMF in order to obtain the HF component. However, this process is computationally complex and requires sufficient processing power. Similarly, the low-frequency component of a specific band is replicated and is encoded to match the original high-frequency signal using envelope/noise floor/time-frequency grid. However, this also requires additional information, such as the envelope/noise and floor/time-frequency grid, and requires bit rates of several kbps (kilobits per second) and a large amount of calculation and processing power.
In certain low bit-rate bitstreams, masking effects are high while the human auditory system frequency resolution is low. Therefore, it is not necessary to represent the signal with high precision. Despite this, existing coding methods store information with irrelevant precision. This leads to inefficient compression. Certain SBR schemes attempt to cover this need, such as U.S. Pat. No. 7,283,955.
However, such methods lack the ability to represent the HF signal content when no similar content is available in the low-frequency part. In particular, deviations in the frequency of tonal components are translated and not scaled. This results in the inability (or poor quality) to reproduce some types of audio signals (such as voice content with vibrato). Additional complex-valued filter banks are inserted in the data flow resulting in higher computational requirements. Such methods, systems, and processes are not efficient when deployed in computationally-sensitive devices.