1. Field of the Invention
The present invention relates to a speech coder using a CELP (Code Excited Linear Prediction) speech coding system, a PSI-CELP (Pitch Synchronous Innovation Code Excited Linear Prediction) speech coding system, or the like.
2. Description of the Prior Art
In recent years, in order to effectively utilize the radio band of an automobile telephone or a portable telephone and compress the amount of information in a voiced portion in multimedia communication, techniques for low bit-rate speech coding have been in the limelight.
As this type of speech coding system, a CELP speech coding system, a PSI-CELP speech coding system, and the like have been already developed.
The CELP speech coding system is a coding system for reproducing speech by constructing a linear filter corresponding to a spectral envelope of input speech by a linear predictive analysis method and driving the linear filter by a time series codevector stored in a codebook.
The PSI-CELP speech coding system is a system for driving a linear predictive filter utilizing a candidate vector previously prepared in a codebook as an excitation source on the basis of the CELP speech coding system. The PSI-CELP speech coding system is characterized in that the excitation source is caused to have periodicity in synchronization with the cycle of an adaptive codebook corresponding to the pitch cycle of speech.
FIG. 6 illustrates one example of a CELP coder.
A continuous input speech signal is first divided into sections at predetermined spacing of approximately 5 to 10 ms. The spacing is herein referred to as a sub-frame.
The input speech is then subjected to linear predictive analysis for each sub-frame by a linear predictive analysis unit 101, to calculate a linear predictive coefficient of p-th degree .alpha..sub.i (i=1, 2, . . . P). A linear predictive synthesis filter 102 is constructed on the basis of the obtained linear predictive coefficient .alpha..sub.i.
An adaptive codebook 103 is then searched. The adaptive codebook 103 is used for representing a periodic component of speech, that is, a pitch.
An output codevector corresponding to an input code to the adaptive codebook 103 is produced by cutting an excitation signal (an adaptive codevector) of the linear predictive synthesis filter 102 in sub-frames from the current sub-frame from its end to a length corresponding to the input code (hereinafter referred to as a lag) and repeatedly arranging an adaptive codevector obtained by the cutting until the length thereof reaches the length of the sub-frame.
The linear predictive synthesis filter 102 is driven using the produced output codevector, to produce reproduced speech. The reproduced speech is multiplied by such gain that the distance between the input speech and the reproduced speech (the distortion of the reproduced speech from the original speech) theoretically reaches a minimum, after which the distance between the input speech and the reproduced speech is calculated by a distance calculating unit 105.
Such an operation is repeated for each input code, whereby a code corresponding to an excitation vector corresponding to reproduced speech at the minimum distance from input speech is selected.
Thereafter, a noise codebook 104 is searched. The noise codebook 104 is used for representing a varying portion of speech which cannot be represented by the adaptive codebook 103. Various codevectors having a length corresponding to one sub-frame generally based on white Gaussian noise (hereinafter referred to as noise codevectors) are previously stored in the noise codebook 104.
A noise codevector corresponding to the input code is read out from the various noise codevectors stored in the noise codebook 104. In order to eliminate the effect of the codevector selected by searching the adaptive codebook, an output obtained by driving the linear predictive synthesis filter 102 using the noise codevector (hereinafter referred to as a synthesis filter output corresponding to the noise codevector) read out is then orthogonalized to a synthesis filter output corresponding to a codevector selected by searching the adaptive codebook, whereby reproduced speech is produced. The reproduced speech is multiplied by such gain that the distance between the input speech and the reproduced speech theoretically reaches a minimum, after which the distance between the input speech and the reproduced speech is calculated by the distance calculating unit 105.
Such an operation is repeated for each input code, whereby a code corresponding to an excitation vector corresponding to reproduced speech at the minimum distance from input speech is selected.
An input code to the adaptive codebook 103 which is selected by searching the adaptive codebook 103 and a code representing gain corresponding thereto, an input code to the noise codebook 104 which is selected by searching the noise codebook 104 and a code representing gain corresponding thereto, and a linear predictive coefficient are outputted as coded signals.
The adaptive codebook 103 efficiently represents a pitch structure of speech in a voiced and stationary portion. In cases such as a case where there is little power of the excitation signal in the preceding sub-frame, a case where the current sub-frame is non-stationary speech in a portion such as a rising portion of speech which is constituted by components different from those in the preceding sub-frame, and a case where the current sub-frame is noise speech in a portion such as a voiceless portion having no pitch cycle, however, the adaptive codebook 103 cannot produce a suitable codevector, thereby degrading the quality of the reproduced speech.
In order to cope with such a problem, a method of preparing a codebook outputting a random component in a complementary manner to the adaptive codebook 103 has been proposed. Such a codebook is called a fixed codebook because it has a structure outputting a codevector in a fixed correspondence with the input code in any sub-frame, similarly to the noise codebook.
The fixed codebook is searched simultaneously with the adaptive codebook, whereby an output vector of either one of the codebooks is exclusively selected in accordance with the minimum distortion standard. Specifically, the adaptive codebook and the fixed codebook are complementary to each other, to operate as one codebook.
A method of causing a noise codevector to have periodicity so as to correspond to the period of an adaptive codevector in order to represent a component which is periodic and cannot be coped with only by components in the preceding sub-frame, that is, a non-stationary component in a voiced portion which cannot be represented by the adaptive codebook as small distortion by the noise codebook has been already proposed.
Since the codevectors stored in the fixed codebook and the noise codebook are codevectors corresponding to noises, however, a portion which is not sufficiently represented by the adaptive codebook in a periodic portion of the input speech cannot, in some cases, be represented even using either method.