A current problem with speech and audio codecs is that they are used in adverse environments where the acoustic input signal is distorted by background noise and other artifacts. This causes several problems. Since the codec now has to encode both the desired signal and the undesired distortions, the coding problem is more complicated because the signal now consists of two sources and that will decrease encoding quality. But even if we could encode the combination of the two courses with the same quality as a single clean signal, the speech part would still be lower quality than the clean signal. The lost encoding quality is not only perceptually annoying but, importantly, it also increases listening effort and, in the worst case, decreases the intelligibility or increases the listening effort of the decoded signal.
WO 2005/031709 A1 shows a speech coding method applying noise reduction by modifying the codebook gain. In detail, an acoustic signal containing a speech component and a noise component is encoded by using an analysis through synthesis method, wherein for encoding the acoustic signal a synthesized signal is compared with the acoustic signal for a time interval, said synthesized signal being described by using a fixed codebook and an associated fixed gain.
US 2011/076968 A1 shows a communication device with reduced noise speech coding. The communication device includes a memory, an input interface, a processing module, and a transmitter. The processing module receives a digital signal from the input interface, wherein the digital signal includes a desired digital signal component and an undesired digital signal component. The processing module identifies one of a plurality of codebooks based on the undesired digital signal component. The processing module then identifies a codebook entry from the one of the plurality of codebooks based on the desired digital signal component to produce a selected codebook entry. The processing module then generates a coded signal based on the selected codebook entry, wherein the coded signal includes a substantially unattenuated representation of the desired digital signal component and an attenuated representation of the undesired digital signal component
US 2001/001140 A1 shows a modular approach to speech enhancement with an application to speech coding. A speech coder separates input digitized speech into component parts on an interval by interval basis. The component parts include gain components, spectrum components and excitation signal components. A set of speech enhancement systems within the speech coder processes the component parts such that each component part has its own individual speech enhancement process. For example, one speech enhancement process can be applied for analyzing the spectrum components and another speech enhancement process can be used for analyzing the excitation signal components.
U.S. Pat. No. 5,680,508 A discloses an enhancement of speech coding in background noise for low-rate speech coder. A speech coding system employs measurements of robust features of speech frames whose distribution are not strongly affected by noise/levels to make voicing decisions for input speech occurring in a noisy environment. Linear programing analysis of the robust features and respective weights are used to determine an optimum linear combination of these features. The input speech vectors are matched to a vocabulary of codewords in order to select the corresponding, optimally matching codeword. Adaptive vector quantization is used in which a vocabulary of words obtained in a quiet environment is updated based upon a noise estimate of a noisy environment in which the input speech occurs, and the “noisy” vocabulary is then searched for the best match with an input speech vector. The corresponding clean codeword index is then selected for transmission and for synthesis at the receiver end.
US 2006/116874 A1 shows a noise-dependent postfiltering. A method involves providing a filter suited for reduction of distortion caused by speech coding, estimating acoustic noise in the speech signal, adapting the filter in response to the estimated acoustic noise to obtain an adapted filter, and applying the adapted filter to the speech signal so as to reduce acoustic noise and distortion caused by speech coding in the speech signal.
U.S. Pat. No. 6,385,573 B1 shows an adaptive tilt compensation for synthesized speech residual. A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To achieve high quality in lower bit rate encoding modes, the speech encoder departs from the strict waveform matching criteria of regular CELP coders and strives to identify significant perceptual features of the input signal.
U.S. Pat. No. 5,845,244 A relates to adapting noise masking level in analysis-by-synthesis employing perceptual weighting. In an analysis-by-synthesis speech coder employing a short-term perceptual weighting filter, the values of the spectral expansion coefficients are adapted dynamically on the basis of spectral parameters obtained during short-term linear prediction analysis. The spectral parameters serving in this adaptation may in particular comprise parameters representative of the overall slope of the spectrum of the speech signal, and parameters representative of the resonant character of the short-term synthesis filter
U.S. Pat. No. 4,133,976 A shows a predictive speech signal coding with reduced noise effects. A predictive speech signal processor features an adaptive filter in a feedback network around the quantizer. The adaptive filter essentially combines the quantizing error signal, the formant related prediction parameter signals and the difference signal to concentrate the quantizing error noise in spectral peaks corresponding to the time-varying formant portions of the speech spectrum so that the quantizing noise is masked by the speech signal formants.
WO 9425959 A1 shows use of an auditory model to improve quality or lower the bit rate of speech synthesis systems. A weighting filter is replaced with an auditory model which enables the search for the optimum stochastic code vector in the psychoacoustic domain. An algorithm, which has been termed PERCELP (for Perceptually Enhanced Random Codebook Excited Linear Prediction), is disclosed which produces speech that is of considerably better quality than obtained with a weighting filter.
US 2008/312916 A1 shows a receiver intelligibility enhancement system, which processes an input speech signal to generate an enhanced intelligent signal. In frequency domain, the FFT spectrum of the speech received from the far-end is modified in accordance with the LPC spectrum of the local background noise to generate an enhanced intelligent signal. In time domain, the speech is modified in accordance with the LPC coefficients of the noise to generate an enhanced intelligent signal.
US 2013/030800 1A shows an adaptive voice intelligibility processor, which adaptively identifies and tracks formant locations, thereby enabling formants to be emphasized as they change. As a result, these systems and methods can improve near-end intelligibility, even in noisy environments.
In [Atal, Bishnu S., and Manfred R. Schroeder. “Predictive coding of speech signals and subjective error criteria”. Acoustics, Speech and Signal Processing, IEEE Transactions on 27.3 (1979): 247-254] methods for reducing the subjective distortion in predictive coders for speech signals are described and evaluated. Improved speech quality is obtained: 1) by efficient removal of formant and pitch-related redundant structure of speech before quantizing, and 2) by effective masking of the quantizer noise by the speech signal.
In [Chen, Juin-Hwey and Allen Gersho. “Real-time vector APC speech coding at 4800 bps with adaptive postfiltering”. Acoustics, Speech and Signal Processing, IEEE International Conference on ICASSP'87. Vol. 12, IEEE, 1987] an improved Vector APC (VAPC) speech coder is presented, which combines APC with vector quantization and incorporates analysis-by-synthesis, perceptual noise weighting, and adaptive postfiltering.