1. Field of the Invention
The present invention relates to encoding and decoding technology for transmitting or storing speech signals at low bit rates. In particular, the present invention relates to code conversion (transcoding) technology for converting a first code sequence obtained by encoding a speech signal with a first speech coding scheme into a second code sequence that is decodable with another speech coding scheme.
2. Description of the Related Art
Code Excited Linear Prediction (CELP) is well known as one of the speech coding schemes that encode a speech signal efficiently at medium and low bit rates. The CELP scheme is described in:
[1] M. R. Schroeder and B. S. Atal, “Code excited linear prediction: high quality speech at very low bit rates,” Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 937-940, 1985.
According to the CELP scheme, the encoder separates, from the input speech signal, Linear Prediction (LP) coefficients for characterizing a linear prediction filter and an excitation signal for exciting this LP filter. The encoder encodes the LP coefficients and the excitation signal, and transmits them to the decoder. The decoder sets the received LP coefficients to its LP filter and excites this LP filter with the received excitation signal to reproduce a high quality speech signal.
This excitation signal is expressed by a weighted sum of Adaptive Codebook (ACB) and Fixed Codebook (FCB). The ACB contains pitch periods of the input speech signal, whereas the FCB consists of random numbers and pulses. Multiplying the ACB and FCB components by their respective gains (ACB gain and FCB gain) yields the excitation signal.
When a 3G (third generation) mobile-network and a wired packet network, for example, are to be interconnected, standard speech coding schemes used in these networks may be different. Thus, in order to achieve direct connection of these two networks, code conversion technology between different speech coding schemes (i.e. transcoding) would be required. Tandem connection is known as one of the transcoding technologies for speech coding.
FIG. 1 shows code conversion apparatus based on the conventional tandem connection. This code conversion apparatus converts a first code sequence produced with a first speech coding scheme into a second code sequence to be decoded with a second speech coding scheme.
With reference to FIG. 1, the conventional code conversion apparatus is described hereafter. The code sequence is input and output at a frame period (e.g. 20 msec) which is a processing unit of speech coding and decoding. As will be described later, each frame consists of a header and a payload.
In FIG. 1, a code sequence conversion circuit 1100 consists of a speech decoding circuit 1050 and a speech encoding circuit 1060. The speech decoding circuit 1050 decodes a first code sequence supplied to an input terminal 10 with a first speech coding scheme. A speech encoding circuit 1060 encodes (or re-encodes) the decoded speech signal being output from the speech decoding circuit 1050 with a second speech coding scheme to generate a second code sequence.
Regarding the speech encoding and decoding scheme, details are found in the reference [1] above and in
[2] 3GPP TS 26.090, “AMR Speech Codec; Transcoding Functions.”
However, the code conversion apparatus in FIG. 1 requires a large amount of processing to achieve the code conversion. The reason for this is that in this code conversion apparatus the speech decoding circuit decodes the first code sequence and re-encodes the decoded speech signal.
US2003/0065508A(reference [3]) discloses a code conversion apparatus which converts the first input code sequence into the code sequence of the second speech coding scheme without decoding a non-speech part within the first code sequence.
In this code conversion apparatus, a code separation part separates a non-speech code within the first code sequence into a plural number of element codes, and a non-speech code conversion part converts these element codes into a plural number of element codes for the second speech coding scheme. This code conversion apparatus multiplexes the second element codes obtained by this conversion to output the second non-speech code sequence. The code conversion apparatus further multiplexes this second non-speech code sequence and a second speech code sequence being converted by a speech code conversion part, and outputs the second code sequence.
This code conversion apparatus requires a non-speech code conversion circuit which converts a first non-speech code sequence into a second non-speech code sequence. This non-speech code conversion requires a large amount of processing. For example, consider a case where the non-speech code sequence conforming to the AMR scheme is to be converted into the non-speech code sequence conforming to ITU-T Recommendation G.729. Each of the code sequences contains LP coefficient information indicating spectrum envelope and power information for every frame as comfortable noise (CN) information.
However, the encoder for the AMR scheme transmits at every 8 frames average values over 8 frames of the LP coefficients and power information. On the other hand, the encoder for the G.729 transmits average values over the previous 6 frames or values for the present frame of the LP coefficient information non-periodically. The encoder for the G.729 also transmits average values over the previous 3 frames or values for the present frame of the power information.
Namely, between these two speech coding schemes, not only concrete codes for the CN information but also transmission intervals for each element code are different. Therefore, the non-speech code conversion circuit given in the reference [3] requires a large amount of processing for converting the element codes.