This application claims the priority of Korean Patent Application No. 2003-47455, filed on Jul. 11, 2003, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present invention relates to a code-excited linear prediction (CELP) speech coding technology, and more particularly, to a transcoder for speech codecs of different CELP type and a method therefor.
2. Description of the Related Art
Technologies for transferring digitized speech signals are widely used not only in wired telecommunication networks including ordinary telephone networks but also in wireless telecommunication networks and voice over internet protocol (VoIP) networks. When a speech signal is sampled in 8 kHz, and then coded in 8 bits per sample, a data bit rate of 64 kbps is needed. However, if speech analysis and an adequate coding method is adopted, it is possible to transfer speech with high quality at a much lower bit rate.
A vocoder is an apparatus which compresses speech by extracting parameters from a speech generation model. The vocoder includes an encoder analyzing speech to extract parameters from an input speech and a decoder synthesizing at a receiver from the parameters transmitted through a communication channel. Until recently, a time-domain vocoder based on linear prediction has been widely used. The time-domain vocoder calculates prediction filter coefficients to minimize errors of original samples by predicting present speech samples from previous speech samples, and performs modeling of error signals passing through a prediction filter by using an adaptive codebook and a fixed codebook.
The vocoder compresses speech signals with low bit rate by removing speech redundancy. In general, the speech signals have short-term redundancy due to a filtering operation of the lips and tongue and long-term redundancy due to the vibration of the vocal chords. A CELP vocoder models the short-term redundancy and the long-term redundancy using a short-term formant filter and a long-term pitch filter, respectively. Residual signals remained by removing the redundancies through the two filters may be encoded using White Gaussian Noise or multi-pulse modeling according to type of CELP used by the vocoder. The basis of this speech technology is to calculate coefficients of the two filters. A formant filter or a linear predictive coding (LPC) filter performs a short-term speech prediction procedure and a pitch filter performs a long-term speech prediction procedure. Finally, a residual signal is modeled to an optimum signal by using analysis-by-synthesis techniques. Thereafter, parameters transmitted to a channel through the analysis include formant, pitch and residual signal information.
There are various networks for speech transmission. Because the networks adopt unique codecs considering the network characteristics, a format conversion procedure between difference codecs is needed for inter-networking. The procedure is called a transcoding procedure and an apparatus performing the procedure is called a transcoder. Generally, a tandem method, which simply connects a decoder of a codec and an encoder of another codec, has been used for the transcoding procedure. However, the tandem method performs a speech encoding and decoding procedure twice, thereby resulting in low speech quality and long delay due to heavy computational amount. To overcome the drawbacks, a bitstream mapping method is used, in which a direct conversion is performed from an encoded bitstream without passing through a decoding procedure like in the tandem method.
FIG. 1 is a drawing for comparing transcoding procedures of a tandem method and a bitstream mapping method. With reference to FIG. 1, in a tandem method, an input speech signal is encoded in a bitstream A in an encoder 102, and then the bitstream A is transmitted to a first channel 104. The bitstream A received through the first channel is decoded in a decoder 106 of a transcoder 114 and then converted into a pulse coded modulation (PCM) signal. The decoded PCM signal is encoded in a bitstream B at an encoder 108 of the transcoder 114, and then transmitted to a decoder 112 through a second channel 110. An output speech signal is obtained through the decoder 112. The transcoder 114 used in the tandem method is composed of the decoder 106 and the encoder 108. On the other hand, in a bitstream mapping method presented in FIG. 1, an input speech signal is encoded in a bitstream A in an encoder 152, and then transmitted to a transcoder 156 through a first channel 154. The transcoder 156 directly converts the received bitstream A into a bitstream B by using a bitstream mapping method, and then transmits the bitstream B to a second channel 158. A decoder 160 decodes the bitstream B received through a second channel 158, and then generates an output speech signal.
FIG. 2 shows a transcoding procedure of FIG. 1, each codec performing. With reference to FIG. 2, a codec A 205 includes a perceptual weighting filter 210, an encoding unit 211, a decoding unit 212, and a post-filter 213. A codec B 215 includes a perceptual weighting filter 223, an encoding unit 222, a decoding unit 221, and a post-filter 220. A transcoder 114 converts a bitstream A in a format of the codec A 205 into a bitstream B in a format of the codec B 215 using the decoding unit 212, the post-filter 213, the perceptual weighting filter 223, and the encoding unit 222. An encoder with an ordinary CELP codec includes a perceptual weighting filter using the fact that perception rate in an acoustic sense is different according to a spectral pattern of a speech signal, and a decoder includes a post-filter for improving the tone quality by compensating spectral distortion generated by the perceptual weighting filter applied in the encoder.
With reference to FIG. 2, an input speech A passes through the perceptual weighting filter 210 considering characteristics of the human auditory organ, is converted into the bitstream A of the codec A format, and is transmitted to the transcoder 114. The transmitted bitstream A passes through the decoding unit 212 in the transcoder 114, and then passes through the post-filter 213 for compensating the effect of the perceptual weighting filter 210 applied in the encoder 102. The speech passing through the post-filter 213 is filtered in the perceptual weighting filter 223 before being encoded in the bitstream B of the codec B format. The speech passing through the perceptual weighting filter 223 is encoded in the bitstream B of the codec B format in the encoding unit 222, and then transmitted to the decoder 112. In the decoding unit 221, the received bitstream B is decoded, filtered in the post-filter 220 for compensating the effect of the perceptual weighting filter 223, and an output speech signal is obtained. The perceptual weighting filter and post-filter, two filters which are used in the described CELP codecs, are the following Equations.
                              post          ⁢                      -                    ⁢          filter          ⁢                      :                    ⁢                                    H              pf                        ⁡                          (              z              )                                      =                                            A              ⁡                              (                                  z                  ⁢                                      /                                    ⁢                                      γ                    n                                                  )                                                    A              ⁡                              (                                  z                  ⁢                                      /                                    ⁢                                      γ                    d                                                  )                                              ·                      (                          1              -                              μ                ·                                  z                                      -                    1                                                                        )                                              [                  Equation          ⁢                                          ⁢          1                ]                                          perceptual          ⁢                                          ⁢          weighting          ⁢                                          ⁢          filter          ⁢                      :                    ⁢                                    H              pwf                        ⁡                          (              z              )                                      =                              A            ⁡                          (                              z                ⁢                                  /                                ⁢                                  γ                  1                                            )                                            A            ⁡                          (                              z                ⁢                                  /                                ⁢                                  γ                  2                                            )                                                          [                  Equation          ⁢                                                            ⁢                                                          ⁢          2                ]            where
            A      ⁡              (        z        )              =          1      -                        ∑                      i            =            1                    p                ⁢                                  ⁢                              a            i                    ·                      z                          -              1                                            ,p is a linear predictive coding (LPC) order, μ is a tilt factor, γn and γd are weights of a post-filter, and γ1 and γ2 are weights of the perceptual weighting filter. In the transcoder 114, the post-filter 213 and the perceptual weighting filter 223 are connected in cascade, and for filtering a signal through the two filters, (2p+1)+2p times multiply-and-accumulate (MAC) operations and (2p+1)+2p memory allocations are needed for each speech sample. The transcoder 114 includes the post-filter 213 of the codec A 205 and the perceptual weighting filter 223 of the codec B 215. Regarded from a receiving end which receives an output speech B, the speech signal passes through two times perceptual weighting filtering and two times post-filtering. Thus, a calculation amount increases and speech spectral distortion occurs due to several times filtering.