1. Field of the Invention
The present invention generally relates to an apparatus for converting voice packets between communication systems. More particularly, the present invention relates to an apparatus and a method for converting LSP (Line Spectrum Pair) parameter for voice packet conversion, which is capable of outputting wanted voice packet through a mutual conversion of voice packets with different formats and their relevant LSP parameters between communication systems using different voice encoders (i.e., vocoders).
2. Background of the Related Art
The evolution of the information and communications industry has let to extensive research on voice processing, as this technology is expected to be an integral part of future communications systems. Research on voice processing can be divided into three types: voice encoding, voice recognition, and voice conversion. Among these, voice encoding technology is most widely used in current multimedia applications.
More specifically, thanks to the development of multimedia and mobile communications, services that used to be available to particular organizations or individuals are now accessible to the public, and the number of services is expected to continue increasing. Unfortunately current transmission rates cannot satisfy the increasing number of users. There was an attempt to increase the number of users by decreasing the transmission rate and allowing more users on an equal channel, but this unavoidably deteriorated speech quality. In lieu of changing the transmission rate, voice encoders also known as vocoders (coder/decoder), has been proposed.
Voice communication services over mobile telecommunications and data networks use different kinds of vocoders depending on the application. More specifically, S-96 QCELP, EVRC, GSM-EFR, or GSM-AMRA are being used in the mobile telecommunication systems, G.723 or G.729 are being used over data networks, and G.711 is being used in PSTN (Public Switched Telephone Network). Because of these different standards, an apparatus for converting voice packets which adhere to different formats is absolutely necessary for allowing communications to take place between networks that use different kinds of vocoders. Such task is accomplished by a media gateway.
FIG. 1 is a schematic diagram of a known wire/wireless communication network. In the drawing, a media gateway (hereinafter, a “packet converter”) 107 converts voice packets that were transferred from different vocoders (EVRC/AMR, G.711, G.723.1/G.729) 101, 102 and 103 through different networks (Mobile Network, PSTN, IP Network) 104, 105, and 106 to voice packets of an object encoder.
In general, standard vocoders currently in use in the wire/wireless communication network are based on the CELP (Code Excited Linear Prediction) type encoding scheme as shown in FIG. 2, although there are minor differences in their specific implementations. The CELP encoder usually extracts a particular parameter of a voice signal.
FIG. 3 is a schematic diagram of a packet converting system of a known voice encoder. As shown, the system includes a first vocoder 110, networks 120 and 140, a second vocoder 150, and a packet converter 130. The first vocoder includes a first encoder (Encoder A) 111 for encoding a voice signal to a voice packet A and a first decoder (Decoder A) 112 for decoding the voice packet A to a voice signal. Networks 120 and 140 transfer the packet to different encoders. The second vocoder 150 includes a second encoder (Encoder B) 151 for encoding a voice signal to a voice packet B and a second decoder (Decoder B) 152 for decoding the voice packet B to a voice signal. And, a packet converter 130 converts the packets that go back and forth between the first vocoder 110 and the second vocoder 150.
The packet converter includes a third decoder (Decoder A) 131 for decoding the voice packet A using the same coding scheme and a third encoder (Encoder B) 132 for encoding the decoded voice signal by the third decoder 131 by using a destination coding scheme and then outputting a packet B. The converter also includes a fourth decoder (Decoder B) 133 for decoding the voice packet B by using the same coding scheme and a fourth encoder (Encoder A) 134 for encoding the decoded voice signal by the fourth decoder 133 by using the designation coding scheme and then outputting a packet A.
Further description on the packet converting apparatus between communication systems now follows with reference to FIG. 3. An input voice signal (PCM) is converted to a voice packet A (Packet A) by the first encoder (Encoder A) 111, and the voice packet A is sent to the packet converter 130 via the connected network 120. The packet converter 130 decodes the voice packet A by the third decoder 131 and then generates a voice signal (PCM) to convert the voice packet A to a destination packet. The decoded voice signal is then encoded by the third encoder 132 and the encoded voice signal is converted to a voice packet B of an object encoder. Finally, the voice packet B is output to the network.
Further, the voice packet B (Packet B) having been converted by the packet converter 130 is transferred to the second decoder 151, the destination, through the connected network 140. The second decoder 151 then decodes the voice packet B, and outputs it as a PCM voice signal.
A voice signal (PCM) inputted in the second vocoder 150 is converted to a voice packet B (Packet B) by the second 152, and the voice packet B is sent to the packet converter 130 via the connected network 140. The packet converter 130 decodes the voice packet B by the fourth decoder 133 and then generates a voice signal (PCM) to convert the voice packet B to a destination packet. The decoded voice signal is then encoded by the fourth encoder 134 and the encoded voice signal is converted to a voice packet A of an object encoder. Finally, the voice packet A is output to the network.
Voice packet A (Packet A) having been converted by the packet converter 130 is transferred to the second decoder 112, the destination, through the connected network 120. The second decoder 121 then decodes the voice packet A and outputs it as a PCM voice signal.
The above-described packet-converting scheme is based on the Tandem encoding scheme, in which an encoded PCM signal goes through a complicated analytical process for packet conversion. Encoding parameters are then obtained therefrom. These parameters are quantized, packeted, and transmitted to a receiving end over the network. In short, the packet is converted by converting parameters indirectly with a PCM signal.
CELP encoders are broadly used in voice communication over data networks such as VoIP (Voice over IP), and particularly G.723.1 is used for transcoding (packet conversion). FIGS. 4 and 5 are flow charts showing how packet conversion is performed in a packet converting apparatus between a first encoder and a second encoder, 0.723.1.
FIG. 4 involves conversion of an encoded packet by another encoder X (110 in FIG. 3), namely the first encoder, to a packet of 0.723.1, namely the second encoder. When an encoded packet X is input, the decoder X performs bit unpacking (S211) on data, and by quantizing the bit unpacked data obtains an LSP (Line Spectrum Pair) parameter (LSPx) (S212). A PCM formatted voice signal is then synthesized using the LSP voice parameter as well as other parameters (S213). Here, LSP are equivalent parameters to be converted for transferring LPC (Linear Predictive Coefficient). That is, each frequency domain is observed.
Encoder G.723.1 220 receives the PCM voice signal, and using an ACR (Auto Correlation Method) obtains linear predictive coefficient (LPCG.723.1(i), 0≦i≦9) (S221) from the PCM voice signal. Then, the encoder G.723.1 220 converts the LPCG 7231(i) to LSP parameters based on the polynomial evaluation and a cosine table having 512 values for compensating LSP scale difference found between the second encoder, G.723.1, and another voice coder (S222). The encoder G.723.1 quantizes LSP parameter to LSP parameter (LPCG.723.1(i), 0≦i≦9) of the encoder G.723.1 (S223), performs bit packing on other quantized data other than the LSP, and outputs the data as a voice packet of the encoder G.723.1 (S224).
The ACR method indicates measurement of similarity (correlation) between an input signal and the signal that delayed the input signal.
The procedure of converting LPC, a vocal tract transfer function, to LSP includes the following steps:                1. Obtain roots of a polynomial composed of LPC        2. Uses cosine table since the roots of the polynomial are expressed by trigonometric function values.        
The CELP vocoder for voice packet conversion extracts a particular parameter in a voice signal, and encodes parameters such as LSP parameters, Pitch, ACB (Adaptive CodeBook), ACB index, FCB (Fixed CodeBook) gain, and FCB index values.
LSP parameters indicate a spectrum envelope of a voice signal, and Pitch and ACB index represent basic frequencies. The ACB gain indicates energy of a pitch element, and FCB gain and index represent the other remainder elements. Although there might be slight differences depending on expression unit or range, quantization method, and transmission rate, such encoding parameters have the same meaning with one another. The voice parameters are used during the course of returning to a wanted packet again after getting them from a packet or PCM signal.
FIG. 5 depicts packet conversion from the G.723.1 encoder (150 in FIG. 3) to another encoder. G.723.1 decoder 230 does the bit unpacking of an encoded packet at the G.723.1 encoder by using the same encoder (i.e., G.723.1) (S231), and obtains the LSP voice parameter of the G.723.1 encoder by unquantizing the unpacked data (S232). And, the PCM formatted voice signal is synthesized by using a voice parameter (S233).
Another encoder X 240 receives the PCM. voice signal from an input of another encoder X, obtains linear predictive coefficient (LPCx(i), 0≦i≦9) out of the PCM input signal by using the ACR (Auto Correlation Method) (S241), converts the LPC parameter to an LSP parameter (LSPx (i)) based on the cosine table having polynomial evaluation and 512 (2π) quantization tables (S242), and quantizes the LSP parameter to make the LSP parameter to another encoded packet (S243). Finally, the LSP parameter is output by doing the bit-packing together with other parameters (S244).
In other words, when transcoding conversion between G.723.1 and another encoder is involved, a PCM signal is obtained from the G.723.1's packet by doing bit-unpacking and quantization processes (namely, encoding), and an LPC parameter for a receiving party is obtained by using the ACR. Here, the LPC is converted to LSP through chebyshev polynomial evaluation and cosine table search. Particularly, the cosine table has set 360 degrees (2π) to 512 to compensate scale differences among different vocoders, and it has a cosine value for every degree, namely values for COS (360/512*n) (n=0˜511).
To summarize, transcoding between G.723.1 and another encoder was realized through the encoding process to obtain a PCM signal, the LPC analytical process based on the ACR, and then LSP converting process through the chebyshev polynomial evaluation and cosine table search. These steps resulted in converting the PCM signal to an encoded packet a receiving party can encode before outputting the signal.
The conventional method has at least one drawback: too many calculations. These calculations include bit-unpacking to obtain a voice parameter, synthesizing a PCM formatted voice signal by using the voice parameter to obtain a PCM signal, and analyzing the PCM signal again to calculate the LSP. Moreover, too many calculations have to be performed in the encoding process to obtain a PCM signal, the LPC analytical process based on the ACR, and the LSP converting process performed through the chebyshev polynomial evaluation and cosine table search.
Considering that 90% of the calculations are for encoding and the remaining 10% is for decoding, much calculation should such encoding and decoding in the course of LSP conversion.
The conventional method has further drawbacks. For example, an additional delay (7.5 ms) could be generated for the LPC analysis, and on the top of searching the cosine table having 512 values during the course of LSP conversion based on polynomial evaluation and cosine table search, a memory is required to store the cosine table.