When transmitting broadband signals, e.g. audio signals such as speech, compression or encoding techniques are used to reduce the bandwidth or bit rate of the signal.
FIG. 1 shows a known parametric encoding scheme, in particular a sinusoidal encoder, which is used in the present invention, and which is described in WO 01/69593 and European Patent Application 02080002.5 (PHNL021216). In this encoder, an input audio signal x(t) is split into several (possibly overlapping) time segments or frames, typically having a duration of 20 ms each. Each segment is decomposed into transient, sinusoidal and noise components. It is also possible to derive other components of the input audio signal such as harmonic complexes, although these are not relevant for the purposes of the present invention.
In the sinusoidal analyser 130 of FIG. 1, the signal x2 for each segment is modeled by using a number of sinusoids represented by amplitude, frequency and phase parameters. This information is usually extracted for an analysis time interval by performing a Fourier transform (FT) which provides a spectral representation of the interval including: frequencies, amplitudes for each frequency, and phases for each frequency, where each phase is “wrapped”, i.e. in the range {−π;π}. Once the sinusoidal information for a segment is estimated, a tracking algorithm is initiated. This algorithm uses a cost function to link sinusoids in different segments with each other on a segment-to-segment basis to obtain so-called tracks. The tracking algorithm thus results in sinusoidal codes CS comprising sinusoidal tracks that start at a specific time instance, evolve for a certain period of time over a plurality of time segments and then stop.
In such sinusoidal encoding, it is usual to transmit frequency information for the tracks formed in the encoder. This can be done in a simple manner and with relatively low costs, because tracks only have a slowly varying frequency. Frequency information can therefore be transmitted efficiently by time-differential encoding. In general, amplitude can also be encoded differentially over time.
In contrast to frequency, phase changes more rapidly with time. If the frequency is (substantially) constant, the phase will change (substantially) linearly with time, and frequency changes will result in corresponding phase deviations from the linear course. As a function of the track segment index, phase will have an approximately linear behavior. Transmission of encoded phase is therefore more complicated. However, when transmitted, phase is limited to the range {−π;π}, i.e. the phase is “wrapped”, as provided by the Fourier transform. Because of this modulo 2π representation of phase, the structural inter-frame relation of the phase is lost and, at first sight, appears to be a random variable.
However, since the phase is the integral of the frequency, the phase is redundant and, in principle, does not need to be transmitted. This reduces the bit rate significantly. In the decoder, the phase is recovered by a process which is called phase continuation.
In phase continuation, only the encoded frequency is transmitted, and the phase is recovered at the decoder from the frequency data by exploiting the integral relation between phase and frequency. It is known, however, that when phase continuation is used, the phase cannot be perfectly recovered. If frequency errors occur, e.g. due to measurement errors in the frequency or due to quantization noise, the phase, which is being reconstructed by using the integral relation, will typically show an error having the character of drift. This is because frequency errors have an approximately random character. Low-frequency errors are amplified by integration, and consequently the recovered phase will tend to drift away from the actually measured phase. This leads to audible artifacts.
This is illustrated in FIG. 2a where Ω and ψ are the real frequency and real phase, respectively, for a track. In both the encoder and decoder, frequency and phase have an integral relationship as represented by the letter “I”. The quantization process in the encoder is modeled as added noise n. In the decoder, the recovered phase {circumflex over (ψ)} thus includes two components: the real phase ψ and a noise component ε2, where both the spectrum of the recovered phase and the power spectral density function of the noise ε2 have a pronounced low-frequency character.
Thus, it can be seen that in phase continuation, the recovered phase is a low-frequency signal itself because the recovered phase is the integral of a low-frequency signal. However, the noise introduced in the reconstruction process is also dominant in this low-frequency range. It is therefore difficult to separate these sources with a view to filtering the noise n introduced during encoding.
Furthermore, in phase continuation, only the first sinusoid of each track is transmitted for each track in order to save bit rate. Each subsequent phase is calculated from the initial phase and frequencies of the track. Since the frequencies are quantized and not always estimated very accurately, the continuous phase will deviate from the measured phase. Experiments show that phase continuation degrades the quality of an audio signal.
European Patent Application 02080002.5 (PHNL021216) addresses these problems by proposing a joint frequency/phase quantizer, where the measured phases of a sinusoidal track, which have values between −π and π, are unwrapped by using the measured frequencies and linking information, resulting in monotonic increasing unwrapped phases along a track. In the encoder, the unwrapped phases are quantized by using an Adaptive Differential Pulse Code Modulation (ADPCM) quantizer and transmitted to the decoder. The decoder derives the frequencies and the phases of a sinusoidal track from the unwrapped phase trajectory.
As an example, the ADPCM quantizer can be configured as described below. For the first continuation of a track, the unwrapped phase is quantized in accordance with Table 1.
TABLE 1Representation table R used for first continuation.Representation level rRepresentation table RLevel type0−3.0Outer level1−0.75Inner level20.75Inner level33.0Outer level
The quantization boundaries are defined in accordance with this table by: {−∞; 2·T (r=1), 0, 2·T (r=2), ∞}. For each consecutive continuation, the tables are scaled. If the representation level is in the outer level, the tables are multiplied by 21/2, making the quantization accuracy coarser. Otherwise, the representation levels are in the inner level and the tables are scaled by 2−1/4, making the quantization accuracy finer. Furthermore, there is an upper and lower boundary to the inner level, namely 3π/4 and π/64.
The quantization of the unwrapped phase trajectory is a continuous process in the above methods, where the quantization accuracy is adapted along the track. Therefore, in order to decode a track, the decoding process has to start from the birth or starting point of a track, i.e. the decoder can only de-quantize a complete track and it is not possible to decode a part of the track. Therefore, special methods enabling random-access have to be added to the encoder and decoder. Random-access may e.g. be used to ‘skip’ or ‘fast forward’ in an audio signal.
A first straightforward way of performing random access is to define random-access frames (or refresh points) in the encoder/quantizer and re-start the ADPCM quantizer in the decoder at these random-access frames. For the random-access frame, the initial tables are used. Therefore, refreshes are as expensive in bits as normal births. However, a drawback of this approach is that the quantization tables and thus the quantization accuracy have to be adapted again from the random-access frame and onwards. Therefore, initially, the quantization accuracy might be too coarse, resulting in a discontinuity in the track, or too fine, resulting in large quantization errors. This leads to a degradation of the audio quality compared to the decoded signals without the use of random-access frames.
A second straightforward way is to transmit all states of the ADPCM quantizer (that is the quantization accuracy and the memories in the predictor as mentioned in European Patent Application 02080002.5 (PHNL021216). The quantizer will then have similar output with or without random-access frames. In this way, the sound quality will hardly suffer. However, the additional bit rate to transmit all this information will be considerable. Especially since the contents of the memories of the predictor have to be quantized according to the quantization accuracy of the ADPCM quantizer.
The present invention addresses these problems.