1. Field
The present disclosure is directed to a method and apparatus for processing audio frames to transition between different codecs. More particularly, the present disclosure is directed to state updating when switching between two coding modes for audio frames.
2. Introduction
Communication devices used in today's society include mobile phones, personal digital assistants, portable computers, desktop computers, gaming devices, tablets, and various other electronic communication devices. Many of these devices transmit audio signals between each other. Codecs are used to encode and decode the audio signals for transmission between the devices. Some audio signals are classified as speech signals having more speech-like characteristics typical of the spoken word. Other audio signals are classified as generic audio signals having more generic audio characteristics typical of music, tones, background noise, reverberant speech, and other generic audio characteristics.
Speech codecs based on source-filter models that are suitable for processing speech signals do not process generic audio signals effectively. The speech codecs include Linear Predictive Coding (LPC) codecs, such as Code Excited Linear Prediction (CELP) codecs. Speech codecs tend to process speech signals well even at low bit rates. Conversely, generic audio processing codecs, such as frequency domain transform codecs, do not process speech signals as efficiently. To process both speech and generic audio signals, a classifier or discriminator determines, on a frame-by-frame basis, whether an audio signal is more or less speech-like and directs the signal to either a speech codec or a generic audio codec based on the classification. An audio signal processor capable of such processing of both speech and generic audio signals is sometimes referred to as a hybrid codec. In some cases the hybrid codec may be a variable rate codec. For example, it may code different types of frames at different rates. As a further example, the generic audio frames, which are coded using the transform domain, are coded at higher rates as opposed to the speech-like frames, which are coded at lower rates.
Transitioning between the processing of speech frames and generic audio frames using speech and generic audio modes, respectively, produces discontinuities. For example, the transition from a speech audio CELP domain frame to a generic audio transform domain frame has been shown to produce discontinuity in the form of an audio gap. The transition from the transform domain to the CELP domain also results in audible discontinuities which adversely affect the audio quality. A major reason for the discontinuity is improper initialization of the various states of the CELP codec. Some of the states which have an adverse effect on the quality include an LPC Synthesis filter state and an Adaptive Codebook (ACB) excitation state.
To circumvent this issue of state update, prior art codecs, such as Extended Adaptive Multi-Rate-Wideband (AMRWB+) and Enhanced Variable Rate Codec-Wideband (EVRC-WB) use LPC analysis even in the audio mode and code the residual in the transform domain. The synthesized output is thus generated by passing the time domain residual obtained using the inverse transform through an LPC synthesis filter. That process by itself generates the LPC synthesis filter state and the ACB excitation state. However, the generic audio signals typically do not conform to the LPC model. Therefore, bits spent on the LPC quantization may result in loss of performance for the generic audio signals.
Thus, there is an opportunity for a method and apparatus for processing audio frames to transition between different codecs.