In general, audio coding, and specifically speech coding, performs a mapping from an analog input audio or speech signal to a digital representation in a coding domain and back to analog output audio or speech signal. The digital representation goes along with the quantization or discretization of values or parameters representing the audio or speech. The quantization or discretization can be regarded as perturbing the true values or parameters with coding noise. The art of audio or speech coding is about doing the encoding such that the effect of the coding noise in the decoded speech at a given bit rate is as small as possible. However, the given bit rate at which the speech is encoded defines a theoretical limit down to which the coding noise can be reduced at the best. The goal is at least to make the coding noise as inaudible as possible.
A suitable view on the coding noise is to assume it to be some additive white or colored noise. There is a class of enhancement methods which after decoding of the audio or speech signal at the decoder modify the coding noise such that it becomes less audible, which hence results in that the audio or speech quality is improved. Such technology is usually called ‘postfiltering’, which means that the enhanced audio or speech signal is derived in some post processing after the actual decoder. There are many publications on speech enhancement with postfilters. Some of the most fundamental papers are [1]-[4].
The basic working principle of pitch postfilters is to remove at least parts of the coding noise which floods the spectral valleys in between harmonics of voiced speech. This is in general achieved by a weighted superposition of the decoded speech signal with time-shifted versions of it, where the time-shift corresponds to the pitch lag or period of the speech. This results in an attenuation of uncorrelated coding noise in relation to the desired speech signal especially in between the speech harmonics. The described effect can be obtained both with non-recursive and recursive filter structures. In practice non-recursive filter structures are preferred.
Relevant in the context of the invention are pitch or fine-structure postfilters. Their basic working principle is to remove at least parts of the coding noise which floods the spectral valleys in between harmonics of voiced speech. This is in general achieved by a weighted superposition of the decoded speech signal with time-shifted versions of it, where the time-shift corresponds to the pitch lag or period of the speech. Preferably, also time-shifted versions into the future speech signal samples are included. One more recent non-recursive pitch postfilter method is described in [5], in which pitch parameters in the signal coding is reused in the postfiltering of the corresponding signal sample. The non-recursive pitch postfilter method of [5] is also applied in the 3GPP AMR-WB+ audio and speech coding standards 3GPP TS 26.290, “Audio codec processing functions; Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec; Transcoding functions” and 3GPP VMR-WB [3GPP2 C.S0052-A, “Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB), Service Options 62 and 63 for Spread Spectrum Systems”. One pitch postfilter method is specified in [6]. This patent describes the use of past and future synthesized speech within one and the same frame.
One problem with pitch postfilters which evaluate future speech signals is that they require access to one future pitch period of the decoded audio or speech signal. Making this future signal available for the postfilter is generally possible by buffering the decoded audio or speech signal. In conversational applications of the audio or speech codec this is, however, undesirable since it increases the algorithmic delay of the codec and hence would affect the communication quality and particularly the inter-activity.