The present invention relates to information signal encoding, such as audio or video encoding.
The usage of digital audio encoding in new communication networks as well as in professional audio productions for bi-directional real time communication necessitates a very inexpensive algorithmic encoding as well as a very short encoding delay. A typical scenario where the application of digital audio encoding becomes critical in the sense of the delay time exists when direct, i.e. unencoded, and transmitted, i.e. encoded and decoded signals are used simultaneously. Examples therefore are live productions using cordless microphones and simultaneous (in-ear) monitoring or “scattered” productions where artists play simultaneously in different studios. The tolerable overall delay time period in these applications is less than 10 ms. If, for example, asymmetrical participant lines are used for communication, the bit rate is an additional limiting factor.
The algorithmic delay of standard audio encoders, such as MPEG-1 3 (MP3), MPEG-2 AAC and MPEG-2/4 low delay ranges from 20 ms to several 100 ms, wherein reference is made, for example, to the article M. Lutzky, G. Schuller, M. Gayer; U. Kraemer, S. Wabnik: “A guideline to audio codec delay”, presented at the 116th AES Convention, Berlin, May 2004. Voice encoders operate at lower bit rates and with less algorithmic delay, but provide merely a limited audio quality.
The above outlined gap between the standard audio encoders on the one hand and the voice encoders on the other hand is, for example, closed by a type of encoding scheme described in the article B. Edler, C. Faller and G. Schuller, “Perceptual Audio Coding Using a Time-Varying Linear Pre- and Postfilter”, presented at 109th AES Convention, Los Angeles, September 2000, according to which the signal to be encoded is filtered with the inverse of the masking threshold on the encoder side and is subsequently quantized to perform irrelevance reduction, and the quantized signal is supplied to entropy encoding for performing redundancy reduction separate from the irrelevance reduction, while the quantized prefiltered signal is reconstructed on the decoder side and filtered in a postfilter with the marking threshold as transmission function. Such an encoding scheme, referred to as ULD (Ultra Low Delay) encoding scheme below, results in a perceptual quality that can be compared to standard audio encoders, such as MP3, for bit rates of approximately 80 kBit/s per channel and higher. An encoder of this type is, for example, also described in WO 2005/078703 A1.
Particularly, the ULD encoders described there use psychoacoustically controlled linear filters for forming the quantizing noise. Due to their structure, the quantizing noise is on the given threshold, even when no signal is in a given frequency domain. The noise remains inaudible, as long as it corresponds to the psychoacoustic masking threshold. For obtaining a bit rate that is even smaller than the bit rate as predetermined by this threshold, the quantizing noise has to be increased, which makes the noise audible. Particularly, the noise becomes audible in domains without signal portions. Examples therefore are very low and very high audio frequencies. Normally, there are only very low signal portions in these domains, while the masking threshold is high. If the masking threshold is increased uniformly across the whole frequency domain, the quantizing noise is at the increased threshold, even when there is no signal, so that the quantizing noise becomes audible as a signal that sounds spurious. Subband-based encoders do not have this problem, since the same simply quantize subbands having smaller signals than the threshold to zero.
The above-mentioned problem that occurs when the allowed bit rate falls below the minimum bit rate, which causes no spurious quantizing noise and which is determined by the masking threshold, is not the only one. Further, the ULD encoders described in the above references suffer from a complex procedure for obtaining a constant data rate, particularly since an iteration loop is used, which has to be passed in order to determine, per sampling block, an amplification factor value adjusting a dequantizing step size.