Todays, coding technology for transferring a speech signal at a low bit rate of 10 Kb/s or lower has been extensively studied. As a practical method is known using a system in which an excitation signal of a speech synthesis filter is represented by a train of pulses aligned at predetermined intervals and the excitation signal is used for coding the speech signal. The details of this method are explained in the paper titled "Regular-Pulse Excitation--A Novel Approach to Effective and Efficient Multipulse Coding of Speech," written by Peter Kroon et al. in the IEEE Report, October 1986, Vol. ASSP-34, pp. 1054-1063 (Document 1).
The speech coding system disclosed in this paper will be explained referring to FIGS. 1 and 2, which are block diagrams of a coding apparatus and a decoding apparatus of this system.
Referring to FIG. 1, an input signal to a prediction filter 1 is a speech signal series s(n) undergone A/D conversion. The prediction filter 1 calculates a prediction residual signal r(n) expressed by the following equation using an old series of s(n) and a prediction parameter a.sub.i (1.ltoreq.i.ltoreq.p), and outputs the residual signal. ##EQU1## where p is an order of the filter 1 and p=12 in the aforementioned paper. A transfer function A(z) of the prediction filter 1 is expressed as follows: ##EQU2##
An excitation signal generator 2 generates a train of excitation pulses V(n) aligned at predetermined intervals as an excitation signal. FIG. 3 exemplifies the pattern of the excitation pulse train V(n). K in this diagram denotes the phase of a pulse series, and represents the position of the first pulse of each frame. The horizontal scale represents a discrete time. Here, the length of one frame is set to 40 samples (5 ms with a sampling frequency of 8 KHz), and the pulse interval is set to 4 samples.
A subtracter 3 calculates the difference e(n) between the prediction residual signal r(n) and the excitation signal V(n), and outputs the difference to a weighting filter 4. This filter 4 serves to shape the difference signal e(n) in a frequency domain in order to utilize the masking effect of audibility, and its transfer function W(z) is given by the following equation: ##EQU3##
As the weighting filter and the masking effect are described in, for example, "Digital Coding of Waveforms" written by N. S. Tayant and P. Noll, issued in 1984 by Prentice-Hall (Document 2), their description will be omitted here.
The error e'(n) weighted by the weighting filter 4 is input to an error minimize circuit 5, which determines the amplitude and phase of the excitation pulse train so as to minimize the squared error of e'(n). The excitation signal generator 2 generates an excitation signal based on these amplitude and phase information. These amplitude and face information are output from an output terminal 6a. How to determine the amplitude and phase of the excitation pulse train in the error minimize circuit 5 will now briefly be described according to the description given in the document 1.
First, with the frame length set to L samples and the number of excitation pulses in one frame being Q, the matrix Q.times.L representing the positions of the excitation pulses is denoted by M.sub.K. The elements m.sub.ij of M.sub.K are expressed as follows; K is the phase of the excitation pulse train. EQU m.sub.ij =1 EQU for j=i.times.N+K-1, EQU m.sub.ij =0 EQU for j.noteq.i.times.N+K-1 (4)
where EQU 0.ltoreq.i.ltoreq.Q.times.1 EQU 0.ltoreq.j.ltoreq.L-1 EQU (N=L/Q)
Given that b.sup.(K) is a row vector having non-zero amplitudes of the excitation signal (excitation pulse train) with the phase K as elements, a row vector u.sup.(K) which represents the excitation signal with the phase K is given by the following equation. EQU u.sup.(K) =b.sup.(K) M.sub.K ( 5)
The following matrix L.times.L having impulse responses of the weighting filter 4 as elements is denoted by H. ##EQU4##
At this time, the error vector e.sup.(K) having the weighted error e'(n) as an element is expressed by the following equation: EQU e.sup.(K) =e.sup.(0) -b.sup.(K) ( 7)
(K=1, 2, . . . N)
where EQU e.sup.(0) =e.sub.0 +r.times.H (8) EQU H.sub.K =M.sub.K H (9)
The vector e.sub.0 is the output of the weighting filter according to the internal status of the weighting filter in the previous frame, and the vector r is a prediction residual signal vector. The vector b.sup.(K) representing the amplitude of the proper excitation pulse is acquired by obtaining a partial derivative of the squared error, expressed by the following equation, EQU E=e.sup.(K) e.sup.(K) t (10)
with respect to b.sup.(K) and setting it to zero, as given by the following equation. EQU b.sup.(K) =e.sup.(0) H.sub.K.sup.t [H.sub.K H.sub.K.sup.t ].sup.-1 ( 11)
Here, with the following equation calculated for each K, the phase K of the excitation pulse train is selected to minimize E.sup.(K). EQU E.sup.(K) =e.sup.(0) [H.sub.K.sup.t [H.sub.K H.sub.K.sup.t ].sup.-1 H.sub.K ]e.sup.(0)t ( 12)
The amplitude and phase of the excitation pulse train are determined in the above manner.
The decoding apparatus shown in FIG. 2 will now be described. Referring to FIG. 2, an excitation signal generator 7, which is the same as the excitation signal generator 2 in FIG. 1, generates an excitation signal based on the amplitude and phase of the excitation pulse train which has been transferred from the coding apparatus and input to an input terminal 6b. A synthesis filter 8 receives this excitation signal, generates a synthesized speech signal s(n), and sends it to an output terminal 9. The synthesis filter 8 has the inverse filter relation to the prediction filter 1 shown in FIG. 1, and its transfer function is 1/A(z).
In the above-described conventional coding system, information to be transferred is the parameter a.sub.i (1.ltoreq.i.ltoreq.p) and the amplitude and phase of the excitation pulse train, and the transfer rate can be freely set by changing the interval of the excitation pulse train, N=L/Q. However, the results of the experiments by this conventional system show that when the transfer rate becomes low, particularly, 10 Kb/s or below, noise in the synthesized sound becomes prominent, deteriorating the quality. In particular, the quality degradation is noticeable in the experiments with female voices with short pitch.
This is because that the excitation pulse train is always expressed by a train of pulses having constant intervals. In other words, as a speech signal for a voiced sound is a pitch-oriented periodic signal, the prediction residual signal is also a periodic signal whose power increases every pitch period. In the prediction residual signal with periodically increasing power, that portion having large power contains important information. In that portion where the correlation of the speech signal changes in accordance with degradation of reverberation, or that part at which the power of the speech signal increases, such as the voicing start portion, the power of the prediction residual signal also increases in a frame In this case too, a large-power portion of the prediction residual signal is where the property of the speech signal has changed, and is therefore important.
According to the conventional system, however, even though the power of the prediction residual signal changes within a frame, the synthesis filter is excited by an excitation pulse train always having constant intervals in a frame to acquire a synthesized sound, thus significantly degrading the quality of the synthesized sound.
As described above, since the conventional speech coding system excites the synthesis filter by an excitation pulse train always having constant intervals in a frame, the transfer rate becomes low, 10 Kb/s or lower, for example, the quality of the synthesized sound is deteriorated.