The sampling frequency at which the CELP coding algorithm operates is generally predetermined and identical in each encoded frame; examples of sampling frequencies are:                8 kHz in the CELP coders defined in ITU-T G.729, G.723.1, G.729.1        12.8 kHz for the CELP part of 3GPP AMR-WB, ITU-T G.722.2, G.718 coders        16 kHz in the coders described, for example, in the articles by G. Roy, P. Kabal, “Wideband CELP speech coding at 16 kbits/sec”, ICASSP 1991, and by C. Laflamme et al., “16 kbps wideband speech coding technique based on algebraic CELP”, ICASSP 1991.        
It will further be noted that in the case of a codec as described in ITU-T Recommendation G.718, a processing module is present for improving the decoded signal by low-frequency noise reduction. It is termed “bass post-filter” in English (BPF) or “low-frequency post-filtering”. It applies at the same sampling frequency as CELP decoding. The purpose of this post-processing is to eliminate the low-frequency noise between the first harmonics of a voiced speech signal. This post-processing is especially important for high-pitched women's voices where the distance between the harmonics is greater and the noise less masked.
Despite the fact that the common term for this post-processing in the field of coding is “low-frequency post-filtering”, it is not, in fact, a simple filtering but rather a fairly complex post-processing that generally contains “Pitch Tracking”, “Pitch Enhancer”, “Low-pass filtering” or “LP-filtering” modules and addition modules. This type of post-processing is described in detail, for example, in Recommendation G.718 (06/2008) “Frame error robust narrowband and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbits/s”, chapter 7.14.1. The block diagram of this post-processing is illustrated in FIG. 29 of the same document.
Here we recall only the principles and elements necessary for understanding the present document. The technique described uses a breakdown into two frequency bands, low band and high band. An adaptive filtering is applied on the low band, determined for covering the lower frequencies at the first harmonics of the synthesized signal. This adaptive filtering is thus parameterized by the period T of the speech signal, termed “pitch”. Indeed, the equations of the operations performed by the “pitch enhancer” module are the following: the signal with the enhanced pitch ŝf(n) is obtained asŝf(n)=(1−α){circumflex over (s)}(n)+αsp(n)wheresp(n)=0.5ŝ(n−T)+0.5ŝ(n+T)
and ŝ(n) is the decoded signal.
This processing requires a memory of the past signal the size of which must cover the various possible values of pitch T (for finding the value ŝ(n−T)). The value of the pitch T is not known for the next frame, thus, generally, for covering the worst possible case, MAXPITCH+1 samples of the past decoded signal are stored for post-processing. MAXPITCH gives the maximum length of the pitch at the given sampling frequency, e.g. generally this value is 289 at 16 kHz or 231 at 12.8 kHz. An additional sample is often stored for subsequently performing an order 1 de-emphasis filtering. This de-emphasis filtering will not be described here in detail as it does not form the subject of the present invention.
When the sampling frequency of the signal at the input or output of the codec is not identical to the CELP coding internal frequency, a resampling is implemented. For example:                In 3GPP AMR-WB, ITU-T G.722.2 codecs, the input and output signal in wideband is sampled at 16 kHz but CELP coding operates at the frequency of 12.8 kHz. It will be noted that the codecs of ITU-T G.718 and G.718 Annex C also operate with input/output frequencies of 8 and/or 32 kHz, with a CELP core at 12.8 kHz.        In the ITU-T G.729.1 codec, the input signal is normally in wideband (at 16 kHz) and the low band (0-4 kHz) is obtained by a bank of QMF filters for obtaining a signal sampled at 8 kHz before coding by a CELP algorithm derived from the codecs of ITU-T G.729 and G.729 Annex A.        
Interest is focused here on a category of codecs supporting at least two internal sampling frequencies, the sampling frequency being able to be selected adaptively in time and variable from one frame to the other. Generally, for a range of “low” bitrates, the CELP coder will operate at a lesser sampling frequency, e.g. fs1=12.8 kHz and for a higher range of bitrates, the coder will operate at a higher frequency, e.g. fs2=16 kHz. A change of bitrate over time, from one frame to another, may in this case cause switching between these two frequencies (fs1 and fs2) according to the range of bitrates covered. This switching of frequencies between two frames may cause audible and troublesome artifacts, for several reasons.
One of the reasons causing these artifacts is that switching internal decoding frequencies prevents low-frequency post-filtering from operating at least in the first frame after switching, since the memory of the post-processing (i.e. the past synthesized signal) is found at a sampling frequency different from the newly synthesized signal.
To remedy this problem, one option consists in deactivating the post-processing over the duration of the transition frame (the frame after the change in internal sampling frequency). This option does not produce a desirable result generally, since the noise which was post-filtered reappears abruptly on the transition frame.
Another option is to leave the post-processing active but setting the memories to zero. With this method, the quality obtained is very mediocre.
Another possibility is also to consider a memory at 16 kHz as if it were at 12.8 kHz by only keeping the latest 4/5 samples of this memory or conversely, to consider a memory at 12.8 kHz as if it were at 16 kHz, either by adding 1/5 zeros at the start (toward the past) of this memory in order to have the correct length, or by storing 20% more samples at 12.8 kHz in order to have enough of them in case of a change in internal sampling frequency. The listening tests show that these solutions do not give a satisfactory quality.
There is therefore a need to find a better quality solution for avoiding a break in the post-processing in case of a change in sampling frequency from one frame to the other.