Communication of data like e.g. speech, audio or video data between terminals is typically performed via encoded data streams sent via a communication network. To communicate an encoded data stream from a sending terminal to a receiving terminal, the data stream is first encoded according to a certain encoding scheme by an encoder of the sending terminal. The encoding is usually performed in order to compress the data and to adapt it to further requirements for communication. The encoded data stream is sent via the communication network to the receiving terminal where the received encoded data stream is decoded by a decoder for a further processing by the receiving terminal. This end-to-end communication relies on that the encoder of the sending terminal and decoder of the receiving terminal are compatible.
A transcoder is a device that performs a conversion of a first data stream encoded according to a first encoding scheme to second a data stream, corresponding to said first data stream, but encoded according to a second encoding scheme. Thus, in case of incompatible encoder/decoder pairs in the sending/receiving terminals one or more transcoders can be installed in the communications network, resulting in that the encoded data stream can be transferred via the communication network to the receiving terminal, whereby the receiving terminal being capable of decoding the received encoded data stream.
Transcoders are required at different places in a communications network. In some communications networks, transmission modes with differing transmission bit rate are available in order to overcome e.g. capability problems or link quality problems. Such differing bit rates can be used over an entire end-to-end communication or only over certain parts. Terminals are sometimes not prepared for all alternative bit rates, which means that one or more transcoders in the communication network must be employed to convert the encoded data stream to a suitable encoding scheme.
Transcoding typically entails decoding of an encoded speech stream encoded according to a first encoding scheme and a successive encoding of the decoded speech stream according to a second encoding scheme. Such tandeming typically uses standardized decoders and encoders. Thus, full transcoding typically requires a complete decoder and a complete encoder. However, existing solutions of such tandeming transcoding, wherein all encoding parameters are newly computed, consumes a lot of computational power, since full transcoding is quite complex, in terms of cycles and memory, such as program ROM, static RAM, and dynamic RAM. Furthermore, the re-encoding degrades the speech representation, which reduces the final speech quality. Moreover, delay is introduced due to processing time and possibly a look ahead speech sample buffer in the second codec. Such delay is detrimental in particular for real- or quasi-real-time communications like e.g. speech, video, audio play-outs or combinations thereof.
Efforts have been made to transcode encoding parameters that represent the encoded data stream according to pre-defined algorithms, to directly form a completely new set of encoding parameters that represent the encoded data stream according to the second encoding scheme without passing the state of the synthesized speech. However, such tasks are complex and many kinds of artifacts are created.
In 3G (UTRAN) networks, the Adaptive Multi-Rate (AMR) encoding scheme will be the dominant voice codec for a long time. The “AMR-12.2” (according to 3GPP/TS-26.071) is an Algebraic Code Excited Linear Prediction (ACELP) coder operating at a bit rate of 12.2 kbit/s. The frame size is 20 ms with 4 subframes of 5 ms. A look-ahead of 5 ms is used. Discontinuous transmission (DTX) functionality is being employed for the AMR-12.2 voice codec.
For 2.xG (GERAN) networks, the GSM-EFR voice codec will instead be dominant in the network nodes for a considerable period of time, even if handsets capable of AMR encoding schemes very likely will be introduced. The GSM-EFR codec (according to 3GPP/TS-06.51) is also based on a 12.2 kbit/s ACELP coder having 20 ms speech frames divided into 4 subframes. However, no look-ahead is used. Discontinuous transmission (DTX) functionality is being employed for the GSM-EFR voice codec, however, differently compared with AMR-12.2.
For communication between the two types of networks, either decoding into the PCM domain (64 kbit/s) or a direct transcoding in the parameter domain (12.2 kbps) to and from AMR-12.2 and GSM-EFR, respectively, will thus be necessary.
A full transcoding (tandeming) in the GSM-EFR-to-AMR-12.2 direction will add at least 5 ms of additional delay due to the look-ahead buffer used for Voice Activity Detection (VAD) in the AMR algorithm. The actual processing delay for full transcoding will also increase the total delay somewhat.
Since the AMR-12.2 and GSM-EFR codecs share the same core compression scheme (12.2 kbit/s ACELP coder having 20 ms speech frames divided into 4 subframes) it may be envisioned that a low complexity direct conversion scheme could be designed. This would then open up for a full 12.2 kbit/s communication also over the network border, compared with the 64 kbit/s communication in the case of full transcoding. One possible approach could be based on a use of the speech frames created by one coding scheme directly by the decoder of the other coding scheme. However, tests have been performed, revealing severe speech artifacts, in particular the appearance of distracting noise bursts.
In the published U.S. patent application 2003/0177004, a method for transcoding a CELP based compressed voice bitstream from a source codec to a destination codec is disclosed. One or more source CELP parameters from the input CELP bitstream are unpacked and interpolated to a destination codec format to overcome differences in frame size, sampling rate etc.
In the U.S. Pat. No. 6,260,009, a method and apparatus for CELP-based to CELP-based vocoder packet translation is disclosed. The apparatus includes a formant parameter translator and an excitation parameter translator. Formant filter coefficients and output codebook and pitch parameters are provided.
None of these prior art systems discuss any remaining interoperability problems for codec systems having similar core compression schemes.