1. Field of the Invention
The present invention relates to voice suppressors for controlling the level of a synthesized voice signal and, more particularly, to a voice suppressor which is preferable for use in, for example, a receiver for receiving transmitted data and for decoding and synthesizing voice in a cellular telephone.
2. Detailed Description of the Related Art
In the field of mobile communication, efforts have recently been put forth for improving transmission efficiency by transmitting voice after encoding it at transmitters and by decoding the encoded data at receivers.
FIG. 4 shows a configuration of an example of a transmitter (encoder) of a cellular telephone used in such mobile communication. In such a transmitter, voice is encoded in accordance with linear predictive coding methods such as CELP (code excited linear predictive coding) method.
The CELP method is an encoding method wherein a signal obtained by performing linear prediction (short-range prediction) and pitch prediction (long-range prediction) on an input voice signal, i.e., a voice source signal, is subjected to vector quantization using a code book in which a variety of waveform patterns (code book vectors) are registered in advance.
According to the first CELP proposed by ATT in 1984, real time processing was difficult because an enormous amount of calculation was required. However, many proposals have been made recently on improvements for reducing the amount of calculation, and real time processing utilizing DSPs (digital signal processors) has been made practical according to some of those proposals.
In the transmitter shown in FIG. 4, code book indexes and a code book gain as initial values are supplied to a code book 31 and multiplier 32, respectively, and a pitch period and a pitch gain as initial values are supplied to a long-range predictor 35.
In the code book 31, waveform patterns of a variety of voice source signals are registered in advance in association with indexes, and a voice source signal associated with a code book index supplied by an error minimizer 41 is read to be supplied to the multiplier 32.
According to a code book gain supplied by the error minimizer 41, the multiplier 32 amplifies (or attenuates) the voice source signal from the code book 31 and supplies it to the long-range predictor 35. The long-range predictor 35 is comprised of an adder 33 and a log-range predictor memory 34 and generates a residual signal based on the voice source signal from the multiplier 32.
Specifically, in the long-range predictor 35, the voice source signal from the multiplier 32 is supplied through the adder 33 to the long-range predictor memory 34 which in turn delays the signal by a period of time corresponding to a pitch period supplied by the error minimizer 41. The long-range predictor memory 34 also amplifies (or attenuates) this delayed signal by a quantity corresponding to a pitch gain also supplied by the error minimizer 41 and outputs it to the adder 33.
The adder 33 adds the output of the long-range predictor memory 34 to the voice source signal from the multiplier 32 to generate the residual signal. This residual signal is input to a linear predictor 38 which in turn generates synthesized voice as described below. This synthesized voice is supplied to a subtracter 39. On the other hand, the input voice signal is subjected to analog-to-digital conversion at an analog-to-digital converter (not shown) and is supplied to the subtracter 39 and a linear predictive coefficient calculator 45. In the calculator 45, the voice signal is subjected to linear predictive analysis which is performed for each frame having a predetermined time length of, for example, 20 ms to calculate linear predictive coefficients of a predetermined number of degrees P, e.g., up to eighth degree.
The linear predictive coefficients are coefficients .alpha..sub.1 through .alpha..sub.P which give the minimum result of the following equation where a voice signal at a point in time n is represented by x.sub.n. EQU x.sub.n +.alpha..sub.1 x.sub.n-1 +.alpha..sub.2 x.sub.n-2 + . . . +.alpha..sub.P x.sub.n-P =.epsilon. Equation 1
The linear predictive coefficients calculated by the calculator 45 are supplied to a short-range predictor 38 as a linear predictor and a parameter encoder 42.
The short-range predictor 38 is comprised of an adder 36 and a short-range predictor memory 37 and is supplied with the residual signal .epsilon. generated by the code book 31, multiplier 32 and long-range predictor 35 as well as the linear predictive coefficients of P-th degree .alpha..sub.1 through .alpha..sub.P for each frame from the calculator 45.
The short-range predictor memory 37 incorporates registers which store the output x.sub.n of the adder 36 (which is synthesized voice to be described later) in a quantity corresponding to the number of the degrees of the linear predictive coefficients, i.e., stores P pieces of the output and sequentially latch the output x.sub.n of the adder 36.
Therefore, at the time n, signals from x.sub.n-1 to x.sub.n-P obtained by delaying the output x.sub.n of the adder 36 by the quantities from 1 to P, respectively, are stored in the short-range predictor memory 37.
The short-range predictor memory 37 respectively multiplies the output x.sub.n-1 through x.sub.n-P stored in the P pieces of registers incorporated therein by the linear predictive coefficients .alpha..sub.1 through .alpha..sub.P from the adder 45, multiplies each of the results by -1, adds them and thereafter outputs the sum to the adder 36.
Thus, the adder 36 is supplied with a signal -(.alpha..sub.1 x.sub.n-1 +.alpha..sub.2 x.sub.n-2 + . . . +.alpha..sub.P x.sub.n-P).
The adder 36 adds the residual signal .epsilon. from the long-range predictor 35 and the signal -(.alpha..sub.1 x.sub.n-1 +.alpha..sub.2 x.sub.n-2 + . . . +.alpha..sub.P x.sub.n-P) from the short-range predictor memory 37 and outputs the sum. Therefore, the adder 36 outputs .epsilon.-(.alpha..sub.1 x.sub.n-1 +.alpha..sub.2 x.sub.n-2 + . . . +.alpha..sub.P x.sub.n-P) which is the voice signal x.sub.n at the time n as apparent from Equation 1.
The voice signal x.sub.n output by the adder 36 is supplied not only to the short-range predictor memory 37 but also to the subtracter 39. The subtracter 39 obtains the difference between the voice signal input at the time n and the voice signal from the adder 36 and supplies it to an auditory weighting device 40. The auditory weighting device 40 reduces quantization noises included in the difference supplied from the subtracter 40 utilizing a masking effect and outputs the result to an error minimizer 41.
The voice signal supplied from the adder 36 to the subtracter 39 has been calculated from the residual signal generated based on the code book index, code book gain, pitch period and pitch gain as initial values as described above. Therefore, in most cases, the voice signal is different from the input voice signal.
The error minimizer 41 performs code book search for determining the code book index and code book gain and pitch search for determining the pitch period and pitch gain so that the difference between the input voice signal supplied from the subtracter 39 through the auditory weighting device 40 and the voice signal supplied from the adder 36 (hereinafter referred to as error signal) is minimized.
The error minimizer 41 performs the code book search and pitch search on each of subframes which are parts of a frame divided at predetermined time intervals, e.g., 5 ms.
Practically, it is difficult to simultaneously obtain an optimum code book index, code book gain, pitch period and pitch gain by performing code book search and pitch search simultaneously because an enormous amount of calculation is required. Thus, the error minimizer 41 first performs the pitch search and then the code book search as described later.
Specifically, during the pitch search, the pitch period M and the pitch gain .beta. are determined so that they give the minimum result of the following equation for each subframe if the pitch period and pitch gain are represented by M and .beta., respectively. EQU E.sub.M =.SIGMA.((x(n)-.beta..times.v(n-M)*h(n))*w(n)).sup.2Equation 2
where .SIGMA. represents summation with n=0 through N-1 (N is the length of the subframe) and * represents convolution integral; v(n), h(n) and w(n) respectively represent a voice source signal, an impulse response of the short-range predictor 38 and an impulse response of the auditory weighting device 40; and x(n) represents an input voice signal.
The pitch period M which brings the minimum result of Equation 2 can be given by obtaining M which brings the minimum result of the following equation. EQU E.sub.M =.SIGMA.(x.sub.w (n)).sup.2 -(.SIGMA.x.sub.w (n)s.sub.w (n)).sup.2 /.SIGMA.(s.sub.w (n)).sup.2 Equation 3
where
x.sub.w (n)=x(n)*w(n); and PA1 s.sub.w (n)=v(n-M)*h(n)*w(n). PA1 p.sub.w (n)=p(n)*w(n); and PA1 q.sub.wj (n)=c.sub.j (n)*h(n)*w(n).
Since the first term on the right side of Equation 3 is constant within a subframe, the minimum value of Equation 3 can be given by selecting the value of M which maximizes the second term on the right side thereof.
After the pitch period M is determined as described above, the pitch gain .beta. is calculated according to the following equation. EQU .beta.=.SIGMA.x.sub.w (n)s.sub.w (n)/.SIGMA.(s.sub.w (n)).sup.2Equation 4
Referring to the code book search, the code book index is represented by j (j=1, 2, . . . , J (J is the number of patterns of the voice source signals registered in the code book 31)); the voice source signal of the index j is represented by c.sub.j (n); and the optimum code book gain for the voice source signal c.sub.j (n) is represented by .gamma..sub.j. Then, the voice source signal c.sub.j (n) which minimizes an error power E.sub.j ' from the input voice signal as given by the following equation is selected as the optimum voice source signal. EQU E.sub.j '=.SIGMA.((p(n)-.gamma..sub.j .times.c.sub.j (n))*h(n))*w(n).sup.2Equation 5
where p(n) represents the difference between the input voice signal x(n) and the synthesized voice signal x.sub.n generated by the short-range predictor 38 in accordance with the voice source signal c.sub.j (n).
The voice source signal c.sub.j (n) which minimizes the Equation 5 can be obtained by obtaining c.sub.j (n) which minimizes the following Equation 6. EQU E.sub.j '=.SIGMA.(p.sub.w (n)).sup.2 -(.SIGMA.p.sub.w (n)q.sub.wj (n)).sup.2 /.SIGMA.(q.sub.wj (n)).sup.2 Equation 6
where
Since the first term on the right side of Equation 6 is constant within a subframe as in Equation 3, the minimum value of Equation 3 will be given by selecting the value of c.sub.j (n) which maximizes the second term on the right side thereof.
After the index j for the voice source signal c.sub.j (n) is determined as described above, the code book gain .gamma..sub.j is calculated according to the following equation. EQU .gamma..sub.j =.SIGMA.p.sub.w (n)q.sub.wj (n)/.SIGMA.(q.sub.wj (n)).sup.2Equation 7
Once the code book index j, code book gain .gamma..sub.j, pitch period M and pitch gain .beta. which minimize (the energy of) an error signal supplied to the error minimizer 41 are determined in accordance with the AbS (analysis by synthesis) method as described above, such parameters are supplied to a parameter encoder 42 along with the linear predictive coefficients calculated by the calculator 45.
In order to reduce the number of codes to be generated, the parameter encoder 42 obtains the differences between the parameters (the code book index j, code book gain .gamma..sub.j, pitch period M and pitch gain .beta. and linear predictive coefficients) of the current frame (or subframe) and the parameters of the preceding frame (or subframe) and interleaves the parameter difference data list so that absence of consecutive data will not be caused by an burst error or the like.
These parameters are supplied from the parameter encoder 42 to a channel encoder 43 which adds error detecting and correcting codes thereto. The parameters are then, for example, convolution-encoded frame by frame and are supplied to a modulator 44. The modulator 44 modulates the encoded data from the encoder 43 and transmits them as a spread spectrum signal having a frequency band spread by the use of, for example, PN (pseudo-random) codes.
FIG. 5 is a block diagram showing a configuration of an example of a receiver of a cellular telephone for receiving and decoding a voice signal which has been encoded and transmitted by the transmitter as described above. The signal (spread spectrum signal) received over a communication channel is supplied to a demodulator 1 to be demodulated using the same PN codes as the PN codes used at the modulator 44 of the receiver in FIG. 4. This demodulated signal is supplied to a channel demodulator 2 wherein it is subjected to convolution-decoding and error detection and correction utilizing the error detecting and correcting codes added thereto. The signal is then supplied to a parameter decoder 3.
The parameter decoder 3 decodes the parameters by deinterleaving the output of the decoder 2 to return the difference data list of the parameters (the code book index j, code book gain .gamma..sub.j, pitch period M and pitch gain .beta. and linear predictive coefficients) to the original state and by adding them with the parameters of the frame (or subframe) which has been decoded immediately before them.
The decoded parameters, i.e., the code book index j, code book gain .gamma..sub.j, pitch period M and pitch gain .beta. are respectively supplied to a code book 4, a multiplier 5 and a long-range predictor 8, and the linear predictive coefficients is supplied to a linear predictor 11.
In the code book 4, waveform patterns of voice source signals which are completely identical to those in the code book 31 of the transmitter 4 in FIG. 4 are registered in association with indexes, and the code book 4 outputs the voice source signal associated with the code book index supplied from the parameter decoder 3 to the multiplier 5.
The multiplier 5 amplifies (or attenuates) the voice source signal from the code book 4 in a quantity corresponding to the code book gain supplied by the parameter decoder 3 and outputs the result to the long-range predictor 8.
The long-range predictor 8 is comprised of an adder 6 and a long-range predictor memory 7 which are identical to the adder 33 and long-range predictor memory 34 in FIG. 4. Specifically, the long-range predictor 8 has the same configuration as that of the long-range predictor 35 of the transmitter shown in FIG. 4. It generates a residual signal from the voice source signal supplied by the adder 5 based on the pitch period and pitch gain supplied by the parameter decoder 3 and outputs the residual signal to the linear predictor 11.
The linear predictor 11 is comprised of an adder 9 and a short-range predictor memory 10 which are identical to the adder 36 and short-range predictor 37 shown in FIG. 4. Specifically, the linear predictor 11 has the same configuration as that of the short-range predictor 38 of the transmitter shown in FIG. 4. It provides a voice signal x.sub.n by synthesizing the residual signal .alpha. supplied by the long-range predictor 8, the linear predictive coefficients .alpha..sub.1, .alpha..sub.2, . . . , .alpha..sub.P supplied by the parameter decoder 3 and synthesized voice signals x.sub.n-1, x.sub.n-2, . . . , x.sub.n-P which have been already synthesized by itself according to the following equation. EQU x.sub.n =.epsilon.-(.alpha..sub.1 x.sub.n-1 +.alpha..sub.2 x.sub.n-2 + . . . +.alpha..sub.P x.sub.n-P) Equation 8
As described above, the same voice signal as the voice signal x.sub.n output by the short-range predictor 38 (FIG. 4) which minimizes the difference from the voice signal x(n) input to the transmitter is synthesized at the receiver.
The voice signal synthesized at the receiver agrees with the voice signal x.sub.n synthesized at the short-range predictor 38 of the transmitter (FIG. 4) according to the AbS method as described above when the signal transmitted from the transmitter (encoded parameters) is received as it is over the channel, i.e., when the values stored in the long-range predictor memory 34 and short-range predictor memory 37 of the transmitter respectively agree with the values stored in the long-range predictor memory 7 and short-range predictor memory 10 of the receiver.
However, errors frequently occur in a signal from the transmitter on a communication channel due to various reasons such as poor quality of the channel, which can hinder the signal transmitted from the transmitter (encoded parameters) from being received by the receiver as it is.
Then, the error detecting and correcting codes are added by the channel encoder 43 (FIG. 5) at the transmitter, and errors are detected and corrected at the receiver by the channel decoder 2 using the error detecting and correcting codes.
However, in the case of an error which is too severe to correct though it can be detected, the values stored in the long-range predictor memory 7 and short-range predictor memory 10 of the receiver will not agree with the values stored in the long-range predictor memory 34 and short-range predictor memory 37 of the transmitter. In this case, the receiver may output a voice signal which is higher or lower in level (energy or amplitude) than the voice signal synthesized by the short-range predictor 38 of the transmitter (FIG. 4) according to the AbS method, and the voice having the higher level (energy or amplitude) can be harmful to the ear drum of the user.
Conventional receivers have an arrangement wherein when an uncorrectable error is detected, the values stored in the long-range predictor memory 7 and short-range predictor memory 10 are changed so that the level of the voice to be synthesized will be reduced based on the parameters which have been used for synthesizing the voice signal before (e.g., immediately before) the detection of the error.
As described above, in conventional receivers, if an error can be detected, it is possible to prevent synthesized voice having a level which can damage the ear drum of the user from being output even if the error can not be corrected.
However, undetectable errors may be generated due to causes such as a communication channel of very poor quality. Especially, since the linear predictive coefficients are highly sensitive to errors, there has been a problem in that an undetectable error can result in an output voice signal having a very high level which can be harmful to the ear drum of the user.
Accordingly it is an object of the present invention to prevent voice of a high level from being output due to an undetectable error to thereby improve the safety of a device.