Frame based encoders, such as speech encoders, use audio signal processing techniques to model a speech signal, and generic data compression algorithms to represent the resulting modelled speech signal in a compact bitstream, which is then transmitted over sequential frames to a decoder. Each of the sequential frames thus includes the coded speech signal and also parameters associated with the speech signal, which parameters are decoded by the decoder and used to enhance the rendering of the decoded speech signal.
In the case of stereo recording, such as in audio and video conferencing as well as broadcasting applications, a stereo signal may be recorded using two microphones. When the two microphones are spaced apart, the recorded signal from a speaker located closer to one microphone than the other, reaches the latter microphone with a delay relative to the other microphone. In order to take account of the delay of the speech signal between the different microphones, a parameter known as the stereo delay parameter or inter-channel time difference (ITD) parameter may be determined from the recorded stereo signal and encoded and transmitted over the frames together with the encoded speech signal and other parameters that describe aspects of the stereo speech signal. These transmitted parameters are used in the decoder to recreate the stereo signal. The ITD parameter may significantly improve the quality of the recreated stereo perspective since ITD is known to be the dominant perceptual influence on stereo location for frequencies below approximately 1 kHz.
Typically, speech encoders employ frame rates of 20 ms which means that each bit within a speech frame consumes 50 bits/s and the synchronous frame structure lends itself to the update of parameters at multiples of 50 Hz. Such update rates are commensurate with the rates of change experienced within the human vocal tract. For example, it is well known that the human vocal tract shape may be adequately represented by parameters (such as the Linear Predictive Code (LPC) parameter) at an update rate of approximately 50 Hz, whereas the speech excitation energy and shape is best modelled at approximately 200 Hz (i.e., the excitation parameters are updated at 200 Hz).
However, as speech encoder functionality is augmented to provide music and stereo coding, such as in the speech encoder known as the Embedded Variable Bit-Rate (EV-VBR) codec which is currently being standardised by the International Telecommunication Union (ITU), additional parameters need to be coded which do not relate to the human vocal tract. Some of these parameters vary at a rate slower than the frame rate and thus, the sending of the same parameter every frame, irrespective of whether the parameter has changed, represents a waste of channel bandwidth resources. Some of these parameters may also require high precision, in terms of numbers of bits, as well as evolve slowly over time. In order to achieve the required high precision, over-sampling combined with a reduction in the number of quantization levels can provide one classical solution but this method has several drawbacks due to the required filtering. Error propagation can occur and there can also be problems with jitter in the output value due to practical realisation of the filter which can also delay the effect of instantaneous parameter changes and introduce difficulties in maintaining encoder and decoder synchronization in analysis-by-synthesis encoder structures.
Thus, it would be advantageous to provide an improved method for encoding and transmitting parameters in a frame based encoding scheme.