In a communication system a communication network is provided, which can link together two communication terminals so that the terminals can send information to each other in a call or other communication event. Information may include speech, text, images or video.
Modern communication systems are based on the transmission of digital signals. Analogue information such as speech is input into an analogue to digital converter at the transmitter of one terminal and converted into a digital signal. The digital signal is then encoded and placed in data packets for transmission over a channel to the receiver of a destination terminal.
The encoding of speech signals is performed by a speech coder. The speech coder compresses the speech for transmission as digital information, and a corresponding decoder at the destination terminal decodes the encoded information to produce a decoded speech signal, whereby the combination of the encoder and decoder results in a decoded speech signal at the destination terminal that (from the perception of the user of the destination terminal) closely resembles the original speech.
Many different types of speech coding are known and optimised for different scenarios and applications. For example, some speech coding techniques are implemented particularly for encoding speech for transmission over low bit-rate channels. Low bit-rate speech coders are useful in many applications, such as voice over internet protocol (“VoIP”) systems and mobile/wireless telecommunications.
An example of a low-rate speech coder is a model-based speech coder that produces a sparse signal representation of the original speech. One particular example of such a model-based speech coder is a speech coder that represents the speech signal as a set of sinusoids. A low-rate sinusoidal speech coder can, for example, encode the linear prediction residual of speech frames classified as voiced using only sinusoids. Many other types of low-rate sparse-signal representation speech coders are also known. These types of low-rate coder form a very compact signal representation. However, the sparse representation in the encoded signal does not fully capture the structure of the speech.
A problem with low-rate model-based speech coders, such as the sinusoidal coder, is that the sparse representation tends to result in metallic-sounding artifacts when the signal is transmitted at a low bit-rate. The metallic artifacts can arise due to the incapability of the underlying sparse model to capture the structure of some of the speech sounds given a limited bit-budget.
If the bit-budget (ultimately related to the bandwidth capabilities of the channel) increases, then more information describing the missing parts of the original speech structure can be added to the transmitted information. This additional description alleviates and eventually removes the artifacts, and thus improves the overall quality and naturalness of the decoded speech signal as perceived by the user of the destination terminal. However, this is obviously only possible if the capability to support a higher bit rate exists.
In addition, the decoding system can compress or expand/stretch a speech signal in time, and/or insert or skip whole speech frames in order to compensate for jitter. Jitter is a variation in the packet latency in the received signal. The decoding system can also insert one or more concealment frames into the speech signal, in order to replace one or more frames that have been lost or delayed in the transmission. The stretching of the speech signal and insertion of the concealment frames into the speech signal can, in particular, give rise to metallic artifacts. These problems are, in general, not mitigated by the use of a higher bit rate.
There is therefore a need for a technique to address the aforementioned problems with low-bit rate coders, and coders in general when loss, delay, and/or jitter may occur in the transmission, in order to improve the perceived quality of the signal at the destination.