Conventionally, some devices are proposed to reduce an average bit rate of transmission of a speech signal in a voice-less period (a period with no voice), by encoding a speech signal at lower bit rates than that used to encode a speech signal in a period with a voice. For example, the technique is disclosed in a document 1 (IEEE Communication Magazine, pages 64-73, September 1997).
The conventional encoding device determines whether the input signal includes a voice or not, for each frame with a predetermined size, e.g. 10 milliseconds, and if the signal in the frame includes a voice, the signal is encoded and decoded in a general speech coding method.
On the other hand, the input signal includes no voice, the conventional coding device discontinuously encodes feature parameters of the input speech signal and transmits the encoded parameters to a decoding device. Herein, the decoding device smoothes the feature parameters discontinuously received, and decodes a speech signal by using the smoothed parameters.
A method of determining whether the speech signal is voice-less or not for each frame, is also disclosed in the document 1. In the method, a root means square value (hereinafter, referred to as “RMS”) computed from an input speech signal for each frame, an RMS corresponding to a low frequency region, the number of zero crossing, and filter coefficients representing spectral envelope characteristics are used.
The determination is done by comparing these values in each frame with the predetermined thresholds.
A method of encoding a speech signal in a period with voice is, for example, disclosed as CELP method (Code Excited Linear Prediction Coding method) in a document 2 (ITU-T recommendation G.729, July, 1995).
The CELP method is disclosed in a document 3 (Code-Excited Linear Prediction: High Quality Speech at Very Low Bit Rates (IEEE Proc. ICASSP-85, pp. 937-940, 1985)).
In an encoding process of a conventional coding device, first, speech signal is inputted frame by frame and is processed with linear predictive analysis to obtain linear predictive (LP) coefficients representing spectral envelope characteristics of a speech, and an excitation signal for driving an LP synthesis filter corresponding to the spectral envelope characteristics is derived to be encoded.
Further, in an encoding process of the excitation signal, each frame is divided into subframes and encoding of the excitation signal is performed for each subframe. Herein, the excitation signal is composed of a pitch element representing a pitch period of the input signal, a residual element, and gains of these elements. The pitch element is denoted as an adaptive codevector which is stored in a codebook, which is referred to as “adaptive codebook”, and includes the past excitation signal. The residual element is denoted as a multipulse signal composed of a plurality of pulses.
Also, in a decoding process, to decode a speech signal, an excitation signal derived by decoding the pitch element and the residual element is fed into a synthesis filter composed of decoded filter coefficients.
In a method of encoding a speech signal in a voice-less period, as described in the document 1, first, an RMS and filter coefficients calculated from the speech are encoded at a coding device. Then, at a decoding device, a multipulse signal and a random signal are generated so that a root mean square of a sum of them is equal to the decoded RMS, and the sum of them is fed to a synthesis filter composed using the decoded filter coefficients to decode a speech signal in a voice-less period.
In a voice-less period, the feature parameters are transmitted only in frames that characteristics of the signal changes, otherwise nothing is transmitted. However, information showing whether the feature parameters is transmitted or not is sent in another way.
When the feature parameters are not transmitted, the output speech signal is decoded by repeatedly using the past transmitted feature parameters. Smoothed RMS is used for decoding not to cause a discontinuity of a waveform of the decoded speech signal.
FIG. 8 shows a block diagram representing a structure of a conventional encoding device. Referring to FIG. 8, the encoding device includes a voice part coding circuit 12, a voice-less part coding circuit 14, a signal determining circuit 16, a switching circuit 18, and a bit sequence generating circuit 20.
A speech signal is inputted frame by frame, for example, in 10 milliseconds unit by an input terminal 10. The signal determining circuit 16 determines whether the speech signal from the input terminal 10 is a period with voice or a voice-less period for each frame, and passes the determining result (VAD determination sign) to the switching circuit 18 and a bit sequence generating circuit 20.
The voice part coding circuit 12 encodes the speech signal from the input terminal 10 for each frame, and passes the encoded signal to the switching circuit 18.
The voice-less part coding circuit 14 encodes the speech signal from the input terminal 10 for each frame, and passes the encoded signal to the switching circuit 18. Further, the voice-less part coding circuit 14 sends determination information (DTX determination sign) indicating whether the encoded signal is transmitted in the voice-less period, to the bit sequence generating circuit 20.
The switching circuit 18 operates based on the VAD determination sign received from the signal determining circuit 16. When the circuit 18 receives the sign indicating a voice period, the encoded signal passed from the voice part coding circuit 12 is sent to the bit sequence generating circuit 20. On the other hand, when the circuit 18 receives the sign indicating a voice-less period, the encoded signal passed from the voice-less part coding circuit 14 is sent to the bit sequence generating circuit 20.
The bit sequence generating circuit 20 multiplexes the VAD determination sign from the signal determining circuit 16, the DTX determination sign from the voice-less part coding circuit 10, and encoded signal from the switching circuit 18, to generate bit sequence and outputs the bit sequence from an output terminal 22.
FIG. 9 shows a block diagram for explaining a conventional decoding device.
Referring to FIG. 9, the decoding device includes a bit sequence decomposing circuit 26, a switching circuit 28, a voice part decoding circuit 30, and a voice-less part decoding circuit 34.
The bit sequence decomposing circuit 26 decomposes a bit sequence inputted from an input terminal 24 into the VAD determination sign, the DTX determination sign, and the encoded signal. And then, the circuit 26 sends the VAD determination sign and the encoded signal to the switching circuit 28, and sends the DTX determination sign to the voice-less part decoding circuit 34.
The switching circuit 28 operates based on the VAD determination sign received from the bit sequence decomposing circuit 26. When the circuit 28 receives the sign indicating a voice period, the encoded signal passed from the bit sequence decomposing circuit 26 is sent to the voice part decoding circuit 30. On the other hand, when the circuit 28 receives the sign indicating voice-less period, the encoded signal passed from the bit sequence decomposing circuit 26 is sent to the voice-less part decoding circuit 34.
The voice part decoding circuit 30 decodes the encoded signal passed from the switching circuit 28 and outputs the decoded signal from an output terminal 32.
The voice-less part decoding circuit 34 decodes the encoded signal passed from the switching circuit 28 by using the DTX determination sign from the bit sequence decomposing circuit 26, and outputs the decoded signal from an output terminal 32.
FIG. 10 shows a block diagram representing a voice-less part decoding circuit 34 of a conventional decoding device. Referring to FIG. 10, the voice-less part decoding circuit 34 includes a parameter decoding circuit 54, a random circuit 56, a pulse circuit 53, a pitch circuit 58, a mixing circuit 61, a smoothing circuit 66, and a synthesis circuit 68.
The parameter decoding circuit 54 decodes filter coefficients and an RMS from the encoded signal inputted from an input terminal 52, and sends the filter coefficients and the RMS to the synthesis circuit 68 and the smoothing circuit 66, respectively.
The smoothing circuit 66 receives the RMS from the parameter decoding circuit 54, and smoothes the RMS. And then the circuit 66 passes the smoothed RMS to the mixing circuit 61. However, if it is found that the encoded signal is not transmitted through the DTX determination sign from an input terminal 50, the circuit 66 calculates the smoothed RMS by smoothing the RMS values of the past frames.
Herein, a smoothed RMS P(n) which is used in the n-th frame in a voice-less period is calculated by using the following equation (1) with the RMS p(n) received in the n-th frame. However, when no encoded signal is transmitted, the RMS of the previous frame is used in the equation (1) instead of p(n).P(n)=(1−α)·P(n−1)+α·p(n)  (1)
Herein, α is a smoothing factor for determining a degree of smoothing, in the above-mentioned document 1, a fixed value 0.125 is set. Further, P(−1) is equal to zero.
The random circuit 56 generates a random signal and passes the random signal to the mixing circuit 61. The pulse circuit 53 generates a multipulse signal composing of a plurality of pulses, each of which has a location and an amplitude determined based on each random number, and passes the multipulse signal to the mixing circuit 61.
The pitch circuit 58 generates a pitch signal q(i) composed of the above-mentioned adaptive codevector, and passes it to the mixing circuit 61. Since a pitch period used to define the adaptive codevector is not transmitted, a random number is used instead.
The mixing circuit 61 computes an excitation signal x(i) to be fed into a synthesis filter by performing the linear sum of the random signal r(i) from the random circuit 56, the multipulse signal p(i) from the pulse circuit 53, and the pitch signal q(i) from the pitch circuit 58, and the result of the computation is sent to the synthesis circuit 68.
A method can be used of computing coupling coefficients of the linear sum as described in the document 1.
In the method, first, a coupling coefficient of the pitch signal Gq is selected from a limited range of values according to a random number.
Next, using the Gq, a coupling coefficient of the multipulse signal Gp is calculated so that the RMS derived from the linear sum of the pitch signal and the multipulse signal is equal to the smoothed RMS.
Using thus calculated Gq and Gp, the linear sum of the pitch signal and the multipulse signal e(i) is calculated according to the following equation (2).e(i)=Gq−q(i)+Gp·p(i)  (2)
Furthermore, a coupling coefficient of the linear sum of e(i) and the random signal r(i), Gr(i) and γ, is computed so that the RMS derived form the linear sum of the e(i) and r(i) is equal to the smoothed RMS. Herein, as a coupling coefficient of the random signal, a fixed value, γ=0.6 is used.
Therefore, the excitation signal to be fed into the synthesis filter, x(i), is computed according to the following equation (3).x(i)=Gr−[Gq·q(i)+Gp−p(i)]+γ·r(i)  (3)
The synthesis circuit 68 decodes the encoded signal by feeding the excitation signal passed from the mixing circuit 61 to a synthesis filter composed of the filter coefficients passed from the parameter decoding circuit 54. Then, the circuit 68 outputs the decoded speech signal from an output terminal 70.
However, the above-mentioned conventional device includes the following problems.
The first problem is that there may be a case where filter coefficients used to decode a speech signal in a voice-less period changes discontinuously at a decoding device, and therefore, degradation of a quality of decoded signal occurs.
That reason is because discontinuously transmitted filter coefficients are used as they are.
The second problem is that a decoding process in the beginning period (for example, several hundreds of milliseconds) in a voice-less period may be influenced by a voice period right before the voice-less period, and consequently an amplitude of the decoded signal is increased over the actual amplitude or degradation of speech quality of the decoded signal occurs, for example, due to existence of echoed sound.
That reason is because a smoothing process of the RMS is always performed in a voice-less period to prevent decoded (reproduced) signals in the voice-less period from being discontinuous.
The third problem is that decoded signal in a voice-less period is remarkably different from a background noise of input speech signal in hearing the decoded signal, and as a result, discontinuous auditory impression is given between the background noise included in the voice-less period and a background noise in a voice period.
That reason is because a fixed value is used as a ratio of a pulse element and a pitch element to a random element, in generating an excitation signal to be fed into the synthesis filter in a voice-less period.
Therefore, the invention is considering the problems. It is a main object of the invention to encode a speech signal in a voice-less period in a high performance, and to provide a device which realizes a high coding quality even if an average transmission bit rate is decreased to encode a speech signal in a voice-less period.
It is another object of the invention to provide a decoding device which can reduce a degradation of the speech quality due to discontinuity of the filter coefficients in decoding a speech signal in a voice-less period.