In a conventional speech codec, classification of speech signals is often performed to improve the coding efficiency of the speech signals. At the decoder side, different types of signal processing tools are used depending on the transmitted classification of the speech signal.
One classification is to distinguish between normal speech signals and transient speech signals. Transient signals are short duration signals and are characterized by a fast change in signal power and amplitude. The transient signals are, e.g., distinguished from “normal” or non-transient signals, e.g. signals with a longer duration and/or only minor changes in signal power and amplitude. This kind of classification is not limited to speech signals but is applicable to audio signals in general.
For transient signals, a common method is to extract the time envelope of the input signal in the encoder, transmit it as side information to the decoder and apply it in the decoder as a post-processing.
For stereo signals, such a kind of post-processing is often necessary, but there are conventionally not enough bits to encode the time envelope of both channels.
In the prior art (E. Schuijers, W. Oomen, B. den Brinker, and J. Breebaart, “Advances in parametric coding for high-quality audio,” in Preprint 114th Cony. Aud. Eng. Soc., March 2003), low-bit-rate stereo coding is based on the extraction and quantization of a parametric representation of the stereo image. The parameters are then transmitted as side information together with a mono downmix signal encoded by a core coder. At the decoder, the stereo signal can be reconstructed based on the mono downmix signal and the side information, i.e. the stereo parameters containing the spatial (left and right) information of the stereo signal.
For a stereo codec, if the downmix mono signal is classified as transient, there may be pre-echo artefacts in the reconstructed stereo signal. The post-processing may be done to improve the quality of this type of signal whose both channels are transient or only one channel is transient. But for a parametric stereo codec, there are conventionally not enough bits to encode the time envelope of both channels.
In other prior art (WO 02/093560 A1) (Improved Transient Pre-Noise Performance of Low Bit Rate Audio Coders Using Time Scaling Synthesis, AES 117, October 2004), the input mono signal is classified into transient and normal categories in the encoder. Then, at the decoder side, based on the transmitted classification information, a time scaling synthesis algorithm is used to improve the quality. All those kinds of algorithms are applied to the mono downmix signal.
The limitation of the bandwidth available for transmitting signals is not only encountered for the transmission of stereo speech or audio signals but forms a general problem for multi-channel audio signal transmission, the stereo audio coding representing a specific case of multi-channel audio coding.