1. Field of the Invention
The present invention relates to methods and apparatus for encoding and decoding audio signals, particularly audio signals that may include both speech-like and non-speech-like signal components simultaneously and/or sequentially in time. Audio encoders and decoders capable of varying their encoding and decoding characteristics in response to changes in speech-like and non-speech-like signal content are often referred to in the art as “multimode” “codecs” (where a “codec” may be an encoder and a decoder). The invention also relates to computer programs on a storage medium for implementing such methods for encoding and decoding audio signals.
2. Summary of the Invention                Throughout this document, a “speech-like signal” means                    a signal that comprises either a) a single, strong periodical component (a “voiced” speech-like signal), b) random noise with no periodicity (an “unvoiced” speech-like signal), or c) the transition between such signal types. Examples of a speech-like signal include speech from a single speaker and the music produced by certain single musical instruments;                        and, a “non-speech-like signal means                    a signal that does not have the characteristics of a speech-like signal. Examples of a non-speech-like signal include a music signal from multiple instruments and mixed speech from (human) speakers of different pitches.                        
According to a first aspect of the present invention, a method for code excited linear prediction (CELP) audio encoding employs an LPC synthesis filter controlled by LPC parameters, a plurality of codebooks each having codevectors, at least one codebook providing an excitation more appropriate for speech-like signals than for non-speech-like signals and at least one other codebook providing an excitation more appropriate for non-speech-like signals than for speech like signals, and a plurality of gain factors, each associated with a codebook. The method comprises applying linear predictive coding (LPC) analysis to an audio signal to produce LPC parameters, selecting, from at least two codebooks, codevectors and/or associated gain factors by minimizing a measure of the difference between the audio signal and a reconstruction of the audio signal derived from the codebook excitations, the codebooks including a codebook providing an excitation more appropriate for a non-speech like signal and a codebook providing an excitation more appropriate for a speech-like signal, and generating an output usable by a CELP audio decoder to reconstruct the audio signal, the output including LPC parameters, codevectors, and gain factors. The minimizing may minimize the difference between the reconstruction of the audio signal and the audio signal in a closed-loop manner. The measure of the difference may be a perceptually-weighted measure.
According to a variation, the signal or signals derived from codebooks whose excitation outputs are more appropriate for a non-speech-like signal than for a speech-like signal may not be filtered by the linear predictive coding synthesis filter.
The at least one codebook providing an excitation output more appropriate for a speech-like signal than for a non-speech-like signal may include a codebook that produces a noise-like excitation and a codebook that produces a periodic excitation and the at least one other codebook providing an excitation output more appropriate for a non-speech-like signal than for a speech-like signal may include a codebook that produces a sinusoidal excitation useful for emulating a perceptual audio encoder.
The method may further comprise applying a long-term prediction (LTP) analysis to the audio signal to produce LTP parameters, wherein the codebook that produces a periodic excitation is an adaptive codebook controlled by the LTP parameters and receiving as a signal input a time-delayed combination of at least the periodic and the noise-like excitation, and wherein the output further includes the LTP parameters.
The adaptive codebook may receive, selectively, as a signal input, either a time-delayed combination of the periodic excitation, the noise-like excitation, and the sinusoidal excitation or only a time-delayed combination of the periodic excitation and the noise-like excitation, and the output may further include information as to whether the adaptive codebook receives the sinusoidal excitation in the combination of excitations.
The method may further comprise classifying the audio signal into one of a plurality of signal classes, selecting a mode of operation in response to the classifying, and selecting, in an open-loop manner, one or more codebooks exclusively to contribute excitation outputs.
The method may further comprise determining a confidence level to the selecting a mode of operation, wherein there are at least two confidence levels including a high confidence level, and selecting, in an open-loop manner, one or more codebooks exclusively to contribute excitation outputs only when the confidence level is high.
According to another aspect of the present invention, a method for code excited linear prediction (CELP) audio encoding employs an LPC synthesis filter controlled by LPC parameters, a plurality of codebooks each having codevectors, at least one codebook providing an excitation more appropriate for speech-like signals than for non-speech-like signals and at least one other codebook providing an excitation more appropriate for non-speech-like signals than for speech-like signals, and a plurality of gain factors, each associated with a codebook. The method comprises separating an audio signal into speech-like and non-speech-like signal components, applying linear predictive coding (LPC) analysis to the speech-like signal components of the audio signal to produce LPC parameters, minimizing the difference between the LPC synthesis filter output and the speech-like signal components of the audio signal by varying codevector selections and/or gain factors associated with the or each codebook providing an excitation output more appropriate for a speech-like signal than for a non-speech-like signal, varying codevector selections and/or gain factors associated with the or each codebook providing an excitation output more appropriate for a non-speech-like signal than for a speech-like signal, and providing an output usable by a CELP audio decoder to reproduce an approximation of the audio signal, the output including codevector selections and/or gains associated with each codebook, and the LPC parameters. The separating may separate the audio signal into speech-like signal components and non-speech-like signal components.
According to two variations of an alternative, the separating may separate the speech-like signal components from the audio signal and derive an approximation of the non-speech-like signal components by subtracting a reconstruction of the speech-like signal components from the audio signal, or the separating may separate the non-speech-like signal components from the audio signal and derive an approximation of the speech-like signal components by subtracting a reconstruction of the non-speech-like signal components from the audio signal.
A second linear predictive coding (LPC) synthesis filter may be provided and the reconstruction of the non-speech-like signal components may be filtered by such a second linear predictive coding synthesis filter.
The at least one codebook providing an excitation output more appropriate for a speech-like signal than for a non-speech-like signal may include a codebook that produces a noise-like excitation and a codebook that produces a periodic excitation and the at least one other codebook providing an excitation output more appropriate for a non-speech-like signal than for a speech-like signal may include a codebook that produces a sinusoidal excitation useful for emulating a perceptual audio encoder.
The method may further comprise applying a long-term prediction (LTP) analysis to the speech-like signal components of the audio signal to produce LTP parameters, in which case the codebook that produces a periodic excitation may be an adaptive codebook controlled by the LTP parameters and it may receive as a signal input a time-delayed combination of the periodic excitation and the noise-like excitation.
The codebook vector selections and/or gain factors associated with the or each codebook providing an excitation output more appropriate for a non-speech-like signal than for a speech-like signal may be varied in response the speech-like signal.
The codebook vector selections and/or gain factors associated with the or each codebook providing an excitation output more appropriate for a non-speech-like signal than for a speech-like signal may be varied to reduce the difference between the non-speech-like signal and a signal reconstructed from the or each such codebook.
According to a third aspect of the present invention, a method for code excited linear prediction (CELP) audio decoding employs an LPC synthesis filter controlled by LPC parameters, a plurality of codebooks each having codevectors, at least one codebook providing an excitation more appropriate for non-speech-like signals than for speech-like signals and at least one other codebook providing an excitation more appropriate for non-speech-like signals than for speech-like signals, and a plurality of gain factors, each associated with a codebook. The method comprises receiving the parameters, codevectors, and gain factors, deriving an excitation signal for the LPC synthesis filter from at least one codebook excitation output, and deriving an audio output signal from the output of the LPC filter or from the combination of the output of the LPC synthesis filter and the excitation of one or more ones of the codebooks, the combination being controlled by codevectors and/or gain factors associated with each of the codebooks.
The at least one codebook providing an excitation output more appropriate for a speech-like signal than for a non-speech-like signal may include a codebook that produces a noise-like excitation and a codebook that produces a periodic excitation and the at least one other codebook providing an excitation output more appropriate for a non-speech-like signal than for a speech-like signal may include a codebook that produces a sinusoidal excitation useful for emulating a perceptual audio encoder.
The codebook that produces periodic excitation may be an adaptive codebook controlled by the LTP parameters and may receive as a signal input a time-delayed combination of at least the periodic and noise-like excitation, and the method may further comprise receiving LTP parameters.
The excitation of all of the codebooks may be applied to the LPC filter and the adaptive codebook may receive, selectively, as a signal input, either a time-delayed combination of the periodic excitation, the noise-like excitation, and the sinusoidal excitation or only a time-delayed combination of the periodic and the noise-like excitation, and wherein the method may further comprise receiving information as to whether the adaptive codebook receives the sinusoidal excitation in the combination of excitations.
Deriving an audio output signal from the output of the LPC filter may include a postfiltering.