1. Field of the Invention
This invention relates to the methods and apparatus for the encoding and decoding of analog signals such as sound and more particularly speech signals to and from digital codes. More particularly this invention relates to methods and apparatus to convolve excitation signals with impulse response functions to form the sound contributions that form a synthesized output sound signal.
2. Description of the Related Art
The structure and function of a codebook excited linear predictive (CELP) coder is well known in the art. The specification for the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) has published a recommended standard entitled "Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 k bit/s," G.723.1, 1996, Geneva, Switzerland that specifies a coded representation that can be used for compressing speech or other audio signals for transmission at very low bit rates.
A speech coder complying with G.723.1 has an input of 16 bit linear Pulse Code Modulated sampled digital data. The sampling has a frequency rate of 8000 Hz. The samples are partitioned into frames of 240 samples that have a duration of 30 ms.
The faster transmission rate of 6.3 k bits/s uses a multi pulse maximum likelihood algorithm to quantize each frame. And the slower transmission rate of 5.3 k bits/s uses an algebraic code-excited linear predictor algorithm to quantize each frame.
The digital channel data transferred from the encoding source to the decoder is the linear split predictor indices, the adaptive codebook gain and lag (the pitch information), the fixed codebook index and gain (the residual information).
FIG. 1 shows a simplified block diagram of a decoder as shown in FIGS. 1 and 2 of G.273.1 and included herein by reference.
The channel data 100 is divided and preprocessed into the filter coefficients h(n) 115, which are retained in the buffer 110, and the pitch/excitation signals 125 which are retained in the buffer 120. The filter coefficients h(n)115 determine the filter characteristics of the synthesis filter 130. The excitation signals e.sub.i (n) 125 are then the input stimuli to the synthesis filter 130. The excitation signals e.sub.i (n) 125 are then filtered to provide the synthesis speech signal y(n) 135 for a frame of 240 samples. The synthesis speech signal y(n) 135 is a digital signal that is the input to a digital-to-analog converter (DAC) that will reproduce a facsimile of the original audio signal.
It is well known in the art that the filtering process is a convolving of the excitation signals e.sub.i (n) 125 with the filter coefficients h(n)115. The convolution of the excitation signals e.sub.i (n) 12 with the filter coefficients h(n) is described according to the following function ##EQU1##
where: PA1 n=0 to (n-1) PA1 where: PA1 where: PA1 where: PA1 where:
n is an index having a value of from 0.ltoreq.n.ltoreq.N-1. PA2 N is the number of samples within a frame of quantized speech. PA2 j is an index counter for the performance of the summation. PA2 e.sub.i (n) is the element of the vector e.sub.i of the excitation signal 125. PA2 h(n) is the vector of the filter coefficients 115. PA2 y(n) is the synthesized speech signal 135. PA2 n is the index value. PA2 y(n) is the codebook contribution to the output signal of the index value. PA2 j is the counter variable of the summation. PA2 e(n-j) is a value for the excitation signal at the index (n-j). PA2 h(j) is the impulse response function at index j. PA2 n is the index value. PA2 x is a rank index value of the non-zero pulses of the excitation signal. PA2 y(n) is the codebook contribution to the output signal of the index value. PA2 k is the counter variable of the summation. PA2 .alpha..sub.k is a sign value of the non-zero pulse of the excitation signal at the index k. PA2 h(n-M.sub.k) is the impulse response function at index (n-m.sub.k). PA2 n is the index value. PA2 y(n) is the codebook contribution to the output signal of the index value. PA2 k is the counter variable of the summation. PA2 e(n-k) is a value for the excitation signal at the index (n-k). PA2 h(k) is the impulse response function at index k. PA2 n is the index value. PA2 x is a rank index value of the non-zero pulses of the excitation signal. PA2 y(n) is the codebook contribution to the output signal of the index value. PA2 k is the counter variable of the summation. PA2 .alpha..sub.k is a sign value of the non-zero pulse of the excitation signal at the index k. PA2 h(n-m.sub.k) is the impulse response function at index (n-m.sub.k).
FIG. 2 is a flow diagram of the operations necessary to complete the convolution of Eq. 1. A frame of the digital data describing the excitation signal e.sub.i n) and the impulse response with the filter coefficients h(n) is received and retained 200. A counter is initialized 205 to the number N of the pitch impulses or samples within the frame. The index counter n is initialized 210 to zero and then tested 215 if the counter is greater than one less than the number of samples N in the frame. If the counter is not 218 greater than one less than the number of samples N in the frame, the value of the synthesized speech signal y(n) is initialized 220 to zero. The counter j for the summation is also initialized to zero. The contribution to the synthesized speech signal y(n) is then calculated 230 by the equation: EQU y(n)=y(n)+e.sub.i (n)h(n-j). Eq. 2
The counter j for the summation is then incremented 235 and tested if it has exceeded the value of the index counter n. If the counter j has not 243 exceeded the value of the index counter n, an updated value of the synthesized speech signal is calculated 230 with new excitation signals e.sub.i (j) and new impulse response coefficients h(n-j) as described in Eq. 2. This reiterates until the value of the counter j of the summation is greater than 242 the value n of the index counter. When the value of the counter j is greater than 242 the index counter n, the index counter n is then incremented 245 and then compared 215 to one less than the number of samples N.
The above described steps are repeated until the index counter reaches the value of the number of samples N, at this point all contributions to the synthesized speech signal y(n) are determined and a new frame of the digital data is received 200.
A calculation of one contribution to the synthesized speech signal y(n) requires (N+1)N/2 multiplications and (N-1)N/2 additions. This calculation of the algorithm has a delay of 37.5 ms.
U.S. Pat. No. 5,754,976 (Adoul et al. 976) describes a method and device for drastically reducing the complexity of a codebook search while encoding a sound signal. The method and device is capable of selecting a priori a subset of the codebook pulse combinations and restraining the combinations to search to the subset. Further, the size of the codebook is increased by allowing the individual code vectors to assume at least one of multiple possible amplitude, while not increasing search complexity.
U.S. Pat. No. 5,701,392 (Adoul et al. 392) provide methods for an algebraic codebook search to encode speech signals. The codebook of Adoul et al 392 consists of a set of code vectors in 40 positions and each comprising multiple non-zero amplitudes assignable to predetermined positions. To reduce the search complexity, a depth-first search is used which involves a tree structure with ordered levels. A path building operation takes place. A path originated at the first level and extended by the path building operations of subsequent levels determine the respective positions of the non-zero amplitudes of a candidate code vector. A signal-based pulse-position likelihood estimate is used during the first few levels to enable initial pulse screening to start the search on favorable conditions.
U.S. Pat. No. 4,944,013 (Gouvianakis et al.) teaches a method of coding speech such that it can be generated by a pulse excitation sequence in a linear predictive coding filter. The sequence contains, in each of successive frame periods, pulse whose positions and amplitudes may be varied. These variables are selected at the coding end to reduce the error between the input and regenerated speech signals. The selection process involves derivation of an initial estimate followed by an iterative adjustment process in which pulses having low energy contributions are tested in alternative positions and transferred to them if a reduced error results.