The present invention relates generally to processing telecommunication signals. More particularly, the invention provides a method and apparatus for translating digital speech packets from one code-excited linear prediction (CELP) format to another CELP format. More specifically, it relates to a method and to an apparatus for interpolating an adaptive codebook pitch lag obtained by a first CELP coder as input into another adaptive codebook pitch lag of a second CELP coder. Merely by way of example, the invention has been applied to voice transcoding, but it would be recognized that the invention may also include other applications.
Telecommunication techniques have developed over the years. As merely an example, coding techniques package signals for transmission over telecommunication media. Coding often includes a process of converting a raw signal (voice, image, video, etc) into a format amenable for transmission or storage. The coding usually results in a large amount of compression, but generally involves significant signal processing to achieve. The outcome of the coding is a bitstream (sequence of frames) of encoded parameters according to a given compression format. The compression is achieved by removing statistically and perceptually redundant information using various techniques for modeling the signal. Hence the encoded format is referred to as a “compression format” or “parameter space”. The decoder takes the compressed bitstream and regenerates the original signal. In the case of speech coding, compression typically leads to information loss.
Coding can be performed using a codec device. As an example, a CELP-(code excited linear prediction) based codec can be thought of as an algorithm that maps between sampled speech and some parameter space using a model of speech production, i.e. it encodes and decodes the digital speech. Generally all CELP-based algorithms operate on frames of speech which are further divided into several subframes. The frame parameters used in CELP-based models has linear-predictive coefficients (LPC) used for short-term prediction of the speech signal (and physically relating to the vocal tract, mouth and nasal cavity, and lips), as well as an excitation signal composed from adaptive and fixed codebooks. The adaptive codebook is used to model long-term pitch information in the speech. Most of the computational effort in analyzing the speech frame is in determining the LPC coefficients and finding the pitch lag (or equivalently adaptive codeword index).
There exists a large number of diverse networks connected to multiple diverse terminals that each support one (or more) of the many CELP based voice coding standards. A lack of inherent interoperability between voice compression standards often means that there may be a need for translation when an end-to-end call traverses network boundaries. Interconnecting these diverse networks and terminals generally requires voice transcoding from one voice standard into another. A need for such transcoding is typically addressed in mobile switching centers, media gateways, multimedia messaging systems, and on the edge of networks.
As merely an example, voice coding in the context of heterogeneous wireless, mobile and wireline networks illustrate networks that run on different standards. There are a wide variety of voice compression and coding standards used for terminals in different networks—G.729 and G.723.1 for Voice over IP (VoIP), GSM, GSM-AMR, EVRC and a range of other standards used (or emerging) on different wireless networks. FIGS. 1A, 1B and 1C illustrate this diversity of CELP based voice compression standards in a simplified manner. In this case voice transcoding occurs at the edge of every network and between any two networks.
The computation of adaptive codebook pitch-lag plays an important role in searching the adaptive codebook in voice transcoding. As frame size or sub-frame size may be different when transcoding between most popular CELP based standards, re-computing the codebook pitch-lag computation for different subframe size standards becomes challenging. For example, the sub-frame size in G.723.1 is 7.5 ms (FIG. 1B), but it is 5 ms in GSM-AMR (FIG. 1A) and it is either 6.625 ms or 6.75 ms in EVRC (FIG. 1C).
Conventional methods of transcoding including tandem transcoding (a brute-force approach) and some “smart” transcoding methods still reconstruct the speech signal and perform extensive computations to extract the pitch-lag through open-loop or closed-loop searching. That is, these methods still operate in the speech signal space, rather than the parameter space. Accordingly, conventional methods are computationally intensive.
In an attempt to eliminate the pitch-lag interpolation in speech signal space, there is a “smart” transcoding that appears in U.S. No. 2002/0077812 A1. Although this method performs transcoding between the CELP parameters, it is only available for a special case that generally requires very restricted conditions between source and destination CELP codecs. For example, it generally requires that the Algebraic CELP (ACELP) algorithm be used and that both source and destination codecs have the same subframe size, which has many limitations and cannot be applied broadly.
Thus, there exists a need for an improved voice transcoder to be capable of efficiently computing adaptive codebook pitch-lag.