1. Field of the Invention
The present invention relates to the improvement in a method of compressing and expanding the time axis of a linear predictive residual waveform in a speech coding and decoding apparatus used for transmitting or storing an input speech signal in the form of a digital signal.
2. Description of the Prior Art
A method of extracting a linear predictive residual waveform (hereinunder referred to as "residual waveform") from a speech waveform input after linear predictive analysis and quantizing it together with the linear predictive coefficient, etc. is one of the high-efficiency compression coding methods. A speech coding and decoding apparatus such as that shown in FIGS. 4A and 4B which adopts this method together with a method of compressing the time axis of a residual waveform utilizing a pitch period is conventionally known. The apparatus shown in FIGS. 4A and 4B is similar to the apparatus described in "Algorithm of 8-16 Kbps Residual Compressing Method (TOR) Algorithm Utilizing Pitch Information", the Transactions of Acoustical Society of Japan 3-2-1 (March, 1986).
FIG. 4A shows a coding portion and FIG. 4B a decoding portion. In these drawings, the reference numeral 1 represents an input speech waveform, 2 a linear predictive inverse filtering means, 3 a linear predictive analyzing means, 4 a residual waveform, 5 a linear predictive coefficient, 23 a pitch extracting means, 8 a pitch period, 24 a residual thinning means, 25 a voiced/unvoiced judging means, 26 voiced/unvoiced judging information, 27 a thinned residual waveform, 28 a residual quantizing means, 13 a quantized residual waveform, 14 a multiplexing means, 15 a transmission path, 16 a separating means, 29 a residual inverse quantizing means, 30 a inverse quantized residual waveform, 31 a residual reproducing means, 20 a reproduced residual waveform, 21 a linear predictive synthetic filtering means and 22 a synthesized speech waveform.
The operation of the conventional apparatus will be explained hereinunder.
The coding portion shown in FIG. 4A will first be explained.
The input speech waveform 1 (time series of discrete value data) is subjected to linear predictive analysis by the linear predictive analyzing means 3 for each analysis frame (hereinunder referred to as "frame") having a fixed length to obtain a linear predictive coefficient. The linear predictive analyzing means 3 outputs the linear predictive coefficient 5 obtained to the linear predictive inverse filtering means 2 and the multiplexing means 14. The linear predictive inverse filtering means 2 processes the linear predictive inverse filtering operation on the input speech waveform 1 for each frame by using the linear predictive coefficient 5, thereby obtaining the residual waveform 4. The pitch extracting means 23 calculates the pitch period 8 from the residual waveform 4 and the input speech waveform 1 of the corresponding frame, for example, using an AMDF method and an auto-correlation method together. The voiced/unvoiced judging means 25 judges whether the input speech waveform 1 is voiced or unvoiced on the basis of the power value of the residual waveform 4 of the corresponding frame and the AMDF value (in accordance with the AMDF method) obtained by the pitch extracting means 23, and outputs the result as the voiced/unvoiced information 26. The residual thinning means 24 outputs a representative residual waveform 27 by thinning the residual waveform 4 by utilizing the pitch period 8 of the residual waveform 4 of the frame when it is judged to be voiced. An example of the thinning operation on a voiced waveform of the residual thinning means 24 is shown in FIG. 5.
In FIG. 5, the waveform (a) represents a residual waveform. The residual thinning means 24 extracts the portion (the square portion bestriding between the current frame and the next frame in the waveform (a)) of the waveform in which a residual pulse having the maximum amplitude is contained and the sum of the absolute values of the amplitudes of the continuous predetermined number of residue pulses is the maximum from the residual waveform in the pitch section (section width: P) which extends to the next frame, and outputs the residual waveform in the portion as a representative residual waveform. The waveforms (b) in FIG. 5 are representative residual waveforms of the precedent frame and the current frame.
When the voiced/unvoiced judging means 25 judges the waveform to be an unvoiced waveform, the residual thinning means 24 sorts the residual pulses in the order of the amplitude, extracts a predetermined number of residual pulses and outputs them as the representative residual waveform 27.
In accordance with the voiced/unvoiced judging information 26, the residual quantizing means 28 quantizes the representative residual waveforms 27 output from the residual thinning means 24 by quantization bit allotment which is preset and is different depending upon whether the waveform is voiced or unvoiced and outputs the quantized residual 13. The multiplexing means 14 multiplexes the pitch period 8, the voiced/unvoiced judging information 26, the quantized residual 13 and the linear predictive coefficient 5, and outputs the result to the transmission path 15 as coded speech information.
The decoding portion shown in FIG. 4B will now be explained.
The separating means 16 separates the coded speech information supplied from the transmission path 15 into the pitch period 8, the voiced/unvoiced judging information 26, the quantized residual 13 and the linear predictive coefficient 5. The residual inverse quantizing means 29 inversely quantizes the quantized residual 13 by allotting bits by using the voiced/unvoiced judging information 26 in the same way as in the quantization by the residual quantization means 28, and outputs the result as the representative residual waveform 30. When the voiced/unvoiced judging information 26 judges the waveform of the current frame to be a voiced waveform, the residual reproducing means 31 repeats the representative residual waveform 30 in the current frame at every pitch period 8 while interpolating the residual waveform reproduced in the precedent frame and the amplitude thereof, thereby reproducing the residual in the entire frame. FIG. 5 shows an example of the operation of reproducing a residual of a voiced speech performed by the residual reproducing means 31. The residual reproducing means 31 repeats the representative residual waveform in the current frame indicated by the symbol (b) in FIG. 5 at every pitch period 8 while interpolating the residual waveform reproduced in the precedent frame and the amplitude thereof, thereby obtaining the reproduced residual waveform (c). On the other hand, when the voiced/unvoiced judging information 26 judges the waveform of the current frame to be an unvoiced waveform, the residual reproducing means 31 restore the pulse of the representative residual waveform 30 to the position before thinning, and reproduces the residual waveform.
The residual reproducing means 31 outputs the residual waveform as the reproduced residual waveform 20. The linear predictive synthetic filtering means 21 synthesizes the speech waveform of the frame from the reproduced residual waveform 20 by linear predictive synthetic filtering using the linear predictive coefficient 5, and outputs the synthesized speech waveform 22.
A conventional speech coding and decoding apparatus, however, has the following problems. When the residual of a voiced sound is reproduced by a decoding portion, the representative residual waveform of the current frame is repeated at every pitch period while interpolating the representative residual waveform and the amplitude thereof of the precedent frame, as described above. Therefore, in a pitch section which is reproduced by interpolation and which has only a small correlation between the original residual waveform and the representative residual waveform, a large distortion is produced between the original waveform and the reproduced residual waveform, thereby deteriorating the quality of the reproduced speech waveform.
In addition, since the residual waveform of a voiced speech which bestrides between the current frame and the next frame is thinned and reproduced by the decoding portion, if the pitch period of the current frame is erroneously transmitted due to a bit error produced in the transmission path, a distortion of the reproduced residual waveform caused by the error affects the antecedent frames. That is, there is low proof of an error in the transmission path.