Analysis-by-synthesis speech coders operate by determining coding parameters at the encoder which minimize a distortion measure between the coded (synthetic) speech and the original speech. These parameters are then forwarded to the decoder where the coded speech is reconstructed. In a conventional system using the above encoding scheme, the encoder searches a "colored" codebook created from an appropriately filtered "white" codebook to find a codeword which will represent a given input frame of speech with minimum error. The index of this codeword is then passed to the receiver where it is used to synthesize the output speech. Known as stochastic coding, this technique is discussed by Atal and Schroeder in "Stochastic Coding of Speech at Very Low Bit Rates", Proc. IEEE Int. Conf. Comm., pp. 1610-1613 (April 1984), and is illustrated in block diagram format in FIG. 1.
As shown, the first sequence of random (e.g., Gaussian) samples represented by the vector y is drawn from a codebook, scaled by a gain factor G, and filtered by A(z) to produce the synthetic speech vector s. The synthetic speech s is then compared with the input speech vector s to calculate the distance E between them. This distance measure is typically the mean weighted squared error. This iterative procedure of coloring and distance calculation is repeated for every entry in the codebook until the Mth codeword has been processed. The index of the codeword that gives the smallest E for the current speech frame being encoded is forwarded to the receiver so that analysis can begin with the next frame. Additionally, the filter and gain parameters are updated periodically and transmitted to the receiver.
The codebook illustrated in FIG. 1 is known as a block code in which each entry is represented in its entirety as a separate sequence of samples. This is the basic and most common form of codebook used in analysis-by-synthesis coders. Although it is considered the most optimal codebook, a great deal of computation is required to search it. Using the operation of a multiply-and-add as a figure of complexity, a coder with a codebook of M codewords, frame length (dimension) of N, and a coloring filter of order P requires on the order of M.multidot.N.multidot.P operations to color the codebook. In addition, about M.multidot.N.multidot.2 operations are needed to calculate the M distances, resulting in a total search complexity figure of M.multidot.N.multidot.(P+2). For example, a coder with M=1024, N=40, and P=10 requires about 491,520 operations to search the codebook for each frame.
Originally, analysis-by-synthesis coders determined the gain once for each frame, usually to match the energy of the synthetic speech to that of the input. This type of procedure, discussed in Atal and Schroeder, supra, is referred to as open-loop because the gain is determined prior to and without regard to the codeword selection. A more effective procedure in which the gain is calculated in a closed-loop is discussed by Trancoso and Atal in "Efficient procedures for finding the optimum innovation in stochastic coders," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 2375-2378, (Apr. 1986). In this approach, the optimum gain for each codeword is calculated so as to minimize the error distance between the synthetic speech computed from that codeword and the input speech. The codeword/gain pair that yields the smallest error for that frame is then used. Because the optimum gain may be determined as part of the distance calculation, there is no real increase in complexity, while a significant increase in performance results.
Recognizing that much of the computational complexity is due to the search of the codebook, other recent efforts have focused on this complexity by using codebooks with some dependencies among codewords. One such codebook is a tree-structured code discussed by Anderson in "Tree Coding of Speech," IEEE Trans. Inform. Theory, vol. IT-21, pp. 379-387(July 1975). In general, a tree-code grows from a root node and has q branches stemming from each node and n codeletters (samples) per branch. A tree of depth L will contain M=q.sup.L codewords, each of frame length N=n.multidot.L, with a q-level path map sequence through the tree corresponding to a unique sequence of codeletters (a single codeword) that is used as the encoder's index in the transmission. Due to the interdependency between codewords, a tree-structured codebook provides reduction in the complexity of the coloring and distance calculation at the cost of some increase in distortion. The computational complexity of a tree code with M codewords, dimension N, order P filter, and a branching factor q is approximated as [(q/q-1)(M-1)](N/log.sub.q M)(P+ 2), where log.sub.q M is the depth of the tree, [q/(q-1)](M-1) is the total number of branches in the tree, and N/log.sub.q M is the number of letters per branch. A binary tree (q=2) applied to the same example as before with M=1024, N=40, and P=10 requires about 98,208 operations to search the codebook for each frame, about one-fifth the complexity of the block code.
A further reduction in the computational complexity may be realized by not searching the entire tree as in an exhaustive search, but rather performing a limited search. One such limited search procedure is the M-algorithm disclosed by Anderson, supra. The algorithm visits at each stage of the tree a fixed number q.multidot.M.sub.s of branches extending out from M.sub.s saved paths which lead up to the present stage. Only the best M.sub.s (those with lowest distance) paths are saved from these visited paths for searching in the next stage. At the final stage of the tree, the codeword associated with the best path is selected.
The search intensity is commonly measured by the number of survivors M.sub.s saved at each stage. Since the coder employing such a limited search visits a finite number (q.multidot.M.sub.s at most) of branches at each stage of the tree, there is consequently a significant reduction in computational complexity compared to the exhaustive tree search. The computational complexity figure for this limited search procedure is approximated by the product of the branching factor, the number of survivors, the number of letters per branch, the depth, and (P+2), and is expressed mathematically as qM.sub.s nL(P+2)=qM.sub.s N(P+2). Using the binary tree (q=2) example above with the M-algorithm search procedure (and adjusting the complexity figures to allow for the growth of the tree), the approximate number of operations for different search intensities are: 960 for M.sub.s =1, 1824 for M.sub.s =2, 3360 for M.sub.s =4, and 6048 for M.sub.s =8. This clearly represents a reduction in the computational complexity, but at the cost of a sub-optimal solution since a potential lowest error path through the entire tree may be discarded at an early stage.
Disadvantageously, conventional coders using a tree-structured code (either exhaustively searched or using a limited search) have always used open-loop gain calculation. However, Lin in "Speech coding using efficient pseudo-stochastic block codes," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 1354-1357 (Apr. 1987) reported a coder with a tree-structured code using the more effective closed-loop gain calculation, but also required an exhaustive search of the tree.