Speech coding based on code-excited linear prediction (CELP) is one of the core technologies used in current voice over Internet protocol (VoIP) applications and mobile communication systems. In order to achieve compatibility and interoperability between communication apparatuses from different suppliers, transcoding needs to be performed between different CELP speech coding standards.
At present, methods for solving the problem include decode then encode (DTE). In the DTE method, a decoder at a sending end decodes a transmitted bit stream and restores a reconstructed speech, and an encoder at a receiving end encodes the reconstructed speech to generate a bit stream decodable by a decoder at the receiving end and then transfers the bit stream to the receiving end. During the process of implementing the disclosure, the inventor found that the DTE method in the prior art is disadvantageous in that the quality of the synthesized speech is decreased, and the computational complexity and overall delay are increased.
For discontinuous transmission (DTX) transcoding algorithms, during the process of implementing the present disclosure, the inventor found that the DTX transcoding algorithms in the prior art is disadvantageous in that a synthesized speech needs to be restored in a media gateway or a base station, and at a target end, all non-speech parameters need to be calculated using the DTE method, and as a result, the computational complexity and overall delay of the transcoding operation as well as the cost are increased, and the efficiency is decreased.
In addition, existing DTX transcoding algorithms can only provide technical solutions for a situation where both the sending end and the target end turn on DTX. But they are not applicable when only the sending end or the target end turns on DTX or it is unknown whether the sending end turns on DTX or not. When the sending end does not turn on DTX and the target end turns on DTX, type information of each frame from a source bit stream indicates a speech frame type, and type information of a target frame cannot be determined. When the sending end turns on DTX and the target end does not turn on DTX, the target frame type does not need to be determined. At this time, the type information of the target frames indicates a speech frame type. Methods for transcoding a silence insertion descriptor (SID) frame or a NO_DATA frame into a speech frame cannot be known.