This invention relates to a multi-pulse voice or speech encoder.
Various multi-pulse voice encoders are already known. For example, a plurality of multi-pulse voice encoders are described in an article read Apr. 10, 1986, by Kazunori Ozawa and Takashi Araseki, both of research laboratiries of the present assignee, and recorded in the proceedings of "ICASSP 86" (IEEE-IECEJ-ASJ International Conference on Acoustics, Speech, and Signal Processing) as article No. 33.3 under the title of "High Quality Multi-Pulse Speech Coder with Pitch Prediction". Another multi-pulse voice encoder is disclosed in U.S. patent application Ser. No. 74,193 filed July 16, 1987, by Yayoi Satoh, the present inventor, et al and assigned to the instant assignee.
Such a multi-pulse voice encoder has an encoder input terminal supplied with an input voice or speech signal which is digitized at a sampling period of, for example, 8 kHz. Besides an encoder output terminal, the voice encoder has an intermediate terminal in the manner which will later be described more in detail. An excitation pulse signal is delivered to the intermediate terminal to represent the input voice signal. It should be noted in this connection for the time being that the input and the intermediate terminals will not be mentioned on briefly describing one of the voice encoders of the Ozawa et al article and the voice encoder of the Satoh et al patent application.
It will later be described more in detail that one of the multi-pulse voice encoders of the Ozawa et al article comprises an analyzing arrangement for analyzing an input voice signal into feature parameters, such as partial correlation (parcor) coefficients, to produce a feature parameter signal representative of the feature parameters. An extracting arrangement extracts pitch periods from the input voice signal to produce a pitch period signal representative of the pitch periods. A pulse searching arrangement is supplied with the feature parameter signal and the pitch period signal. Supplied furthermore with the input voice signal, the pulse searching arrangement searches excitation pulses or sound source pulses representative of the input voice signal in each pulse search duration or interval determined with reference to the pitch periods. The pulse searching arrangement thereby produces an excitation pulse signal representative of the excitation pulses searched in the pitch periods, namely, throughout the input voice signal.
In practice, each frame of the input voice signal is divided into a plurality of subframes with reference to the pitch periods. More particularly, each subframe is either one pitch period long or one pitch period less several sampling periods long. In the excitation pulse signal, one of the excitation pulses may appear at a point between two consecutive frames or at a point between two consecutive subframes. Pulse search is therefore carried out by the pulse searching arrangement as regards the input voice signal of the pulse search duration which is equal to one subframe period plus an impulse response length of the analyzing arrangement.
It will later be described also more in detail that the multi-pulse voice encoder of the Satoh et al patent application comprises an analyzing arrangement which is similar to that used in the above-described voice encoder of the Ozawa et al article and produces a feature parameter signal representative of feature parameters of an input voice signal. A residual signal producing arrangement is controlled by the feature parameter signal. Supplied with each frame of the input voice signal, the residual signal producing arrangement produces a prediction residual signal related to the frame being dealt with. Supplied with the feature parameter signal and the prediction residual signal, a pulse searching arrangement searches excitation pulses representative of the input voice signal in the frame under consideration and produces an excitation pulse signal representative of the excitation pulses searched for the input voice signal, namely, in a succession of the frames.
In the Satoh et al encoder, the frame is not divided into subframes. Like in the above-described pulse searching arrangement of the Ozawa et al article, the pulse search is carried out in connection with the input voice signal of a pulse search duration which is equal to one frame period plus an overlap interval or duration. Although nothing is mentioned in the Satoh et al patent application, each frame may be divided into subframes with reference to the pitch periods. In this event, the pulse search duration should be equal to one subframe period plus the impulse response length.
The pulse searching arrangement of the above-described encoder of the Ozawa et al article comprises a pitch prediction filter for predicting the pitch period in a wave form domain or on a wave form level. A pitch prediction residual signal is produced in the pulse searching arrangement by using the predicted pitch period and the input voice signal and is used in the pulse search. In contrast, the prediction residual signal is a linear prediction residual signal in the Satoh et al encoder.
When the pitch prediction filter predicts the pitch period in the wave form domain, the excitation pulse signal must be used in the pulse searching arrangement to produce a reproduced voice signal in each subframe. If the input voice signal is processed frame by frame, a boundary between two consecutive frames must be processed by using the overlap interval. For this purpose, memories of a large total memory capacity must be included here and there in the voice encoder. If the input voice signal is processed subframe by subframe, a boundary between two consecutive subframes must be processed by using the impulse response length as a similar overlap interval. Depending on the pitch periods, the subframe period may become shorter than the impulse response length. This makes it difficult to search the excitation pulses. In other words, a long-continued processing time becomes necessary to search the excitation pulses. As a result, hardware of a large scale becomes indispensable.