1. Field of the Invention
The present invention relates to a speech coder which subjects a speech signal to data compression prior to digital transmission or storage.
2. Description of the Related Art
There are speech coding systems in which a speech signal is separated into a parameter representative of a synthesis filter and a parameter representative of a sound source to thereby effect data compression. One example of such coding is code-excited linear prediction (hereinafter referred to as CELP).
One example of CELP is shown in M. R. Schroeder, B. S. Atal, "Code-Excited Linear Prediction (CELP): High-quality speech at very low bit rates", in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 937-940 (1985). In this paper, a parameter representative of a synthesis filter is analytically obtained once every 10 msec, 40-point random noise time series, that is, 40-dimensional vectors (hereinafter referred to as "sound source vectors"), produced from random numbers, are employed as a parameter representative of a sound source time-corresponding to speech which is coded in blocks each consisting of 40 speech samples (5 msec in duration when the sampling frequency is 8 kHz).
I. M. Trancoso, B. S. Atal, "Efficient procedures for finding the optimum innovation in stochastic coders", Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2375-2378 (1986) disclose a speech coder performing the speech coding of the Schroeder et al. paper (see FIG. 3).
Referring to FIG. 3, the reference numeral 3 denotes N-dimensional Discrete Fourier transform (hereinafter referred to as DFT) vectors obtained by subjecting sound source vectors, which are J-point sampling value sequences, to 2.multidot.N-point DFT. The reference numeral 1 denotes a code book comprising L DFT sound source vectors 3. The reference numeral 5 denotes a change-over switch used to select the DFT sound source vectors 3 stored in the code book 1. Denominator term computing circuit 15 outputs a denominator term 17 for distortion computation on the basis of the squared value 16 (hereinafter referred to as "squared evaluation weight") of the amplitude term of certain frequency characteristics (hereinafter referred to as "evaluation weighting filter"). Those frequency characteristics are obtained by subjecting the DFT sound source vector 3 and the impulse response of the synthesis filter to 2.multidot.N-point DFT. Vector product sum computing circuit 8 is supplied as its inputs, with the DFT sound source vector 3 and the weighted DFT input speech 7. Weighted DFT input speech 7 is the product of the evaluation weighting filter output and the conjugate complex number of what is obtained by subjecting input speech which is a J-point sampling value sequence to 2.multidot.N-point DFT. In response to those inputs, vector product sum computing circuit 8 outputs a numerator term vector product sum 10 for distortion computation. Final distortion computing circuit 12 computes a distortion 18 of the synthetic speech from the input speech in the frequency domain on the basis of the numerator term vector product sum 10 and the denominator term 17. Optimum sound source vector selecting circuit 19 selects a sound source vector code 20 corresponding to a sound source vector having the smallest distortion 18. The reference symbol A denotes a distortion computing means.
The operation will next be explained by use of the flowchart shown in FIG. 4.
When the k-th one of the L DFT sound source vectors 3 stored in the code book 1 is used, distortion 18 is generally known as follows: ##EQU1## where X(i) is the i-th component in the DFT input speech, H(i) the i-th component in the evaluation weighting filter, C(i, k) the i-th component in the k-th DFT sound source vector, and g(k) the gain coefficient that minimizes the distortion E(k).
First, the vector product sum computing circuit 8 is supplied, as its inputs, with the DFT sound source vector C(i, k) and the weighted DFT input speech Y(i). These inputs enable circuit 8 to output the following numerator term vector product sum P(k) (Step ST1): ##EQU2## where Y(i)* denotes the conjugate complex number of Y(i) which satisfies the relation of Y(i)=X(i).multidot.H(i)*, and the symbols Re. and Im. denote the real and imaginary numbers, respectively, of the complex number.
The denominator term computing circuit 15 is supplied, as its inputs, with the DFT sound source vector C(i,k) and the squared evaluation weight a(i).sup.2, to output the following denominator term 17 (Step ST2): ##EQU3##
Since a(i).sup.2 is the square of the evaluation weighting filter H(i), the relation of a(i).sup.2 =.vertline.H(i).vertline..sup.2 is satisfied.
Next, the final distortion computing circuit 12 is supplied, as its inputs, with the numerator term vector product sum P(k) expressed by the equation (2) and the denominator term 17 expressed by the equation (3) to output the following distortion E(k) (Step ST3): ##EQU4##
It should be understood to be known that the equation (4) is obtained by selecting a gain coefficient g(k) that minimizes the distortion E(k) of the equation (1) and that the equation (4) is equivalent to the equation (1).
After the final distortion computing circuit 12 completes the computation of distortions 18 for all the L DFT sound source vectors 3 (Step ST4), the optimum sound source vector selecting circuit 19 selects as an optimum sound source vector code 20 the number of the DFT sound source vector 3 that gives the smallest value of the L distortions 18 (Step ST5).
The conventional speech coder described above carries out L numerator term vector multiply-add operations in the vector product sum computing circuit 8 to compute L distortions. This conventional speech decoder needs to increase the value of L (e.g., L=1024) in order to code speech in high quality (i.e., the synthetic speech includes no noise). However, if L is increased, the computational complexity, that is, the number of multiply-add operations, required for the distortion computation becomes enormous and, at the same time, the memory capacity needed for the code book increases enormously, resulting in an exceedingly large-scale speech coder.