1. Field of the Invention
The present invention relates to a speech decoding unit and a speech decoding method for reproducing far-end talker background noise when detecting speech pauses that do not contain speech of a far-end talker.
2. Description of Related Art
FIG. 1 is a block diagram showing a configuration of a conventional speech decoding unit disclosed in Japanese patent application laid-open No. 7-129195/1995, for example. In this figure, the reference numeral 1 designates an input terminal for inputting a speech code sequence; 2 designates an excitation signal generator for generating an excitation signal from the speech code sequence; 3 designates a speech spectrum coefficient generator for generating speech spectrum coefficients from the speech code sequence; 4 designates a synthesis filter for reproducing a speech signal from the excitation signal generated by the excitation signal generator 2 and the speech spectrum coefficients generated by the speech spectrum coefficient generator 3; 5 designates a speech spectrum coefficient buffer for holding the speech spectrum coefficients generated by the speech spectrum coefficient generator 3; 6 designates a speech spectrum coefficient interpolator for carrying out linear interpolation of the speech spectrum coefficients during speech pauses; 7 designates a speech output circuit for supplying the speech signal reproduced by the synthesis filter 4 to an output terminal 8; and 8 designates the output terminal.
Next, the operation of the conventional speech decoding unit will be described.
First, when a speech coder (not shown) detects speech of a far-end talker, it encodes the speech, and transmits the speech code sequence to the speech decoding unit.
When the speech of the far-end talker interrupts, the speech coder detects the speech pause of the far-end talker with an internal VOX (voice operated transmitter), and halts the transmission of the speech code sequence to the speech decoding unit. Instead, the speech coder transmits a unique word (post-amble POST) indicating the start of the speech pause and coding parameters indicating far-end talker background noise information.
During a speech burst in which the speech of the far-end talker is detected, the speech coder transmits the speech code sequence, so that in the speech decoding unit, the excitation signal generator 2 generates the excitation signal from the speech code sequence, and the speech spectrum coefficient generator 3 generates the speech spectrum coefficients from the speech code sequence.
When the speech burst begins because of the transition from the speech pause to the speech burst, the speech coder transmits a unique word called a preamble PRE so that the speech decoding unit can detect the start of the speech burst by detecting the unique word.
When the excitation signal generator 2 generates the excitation signal and the speech spectrum coefficient generator 3 generates the speech spectrum coefficients, the synthesis filter 4 reproduces the speech signal from the excitation signal and speech spectrum coefficients.
Then, the speech output circuit 7 supplies the speech signal reproduced by the synthesis filter 4 to the output terminal 8.
On the other hand, during the speech pause in which the speech of the far-end talker is not detected, although the speech coder halts the transmission of the speech code sequence, it transmits a unique word (post-amble POST) indicating the start of the speech pause, followed by the coding parameters indicating the far-end talker background noise information, so that in the speech decoding unit, the speech spectrum coefficient generator 3 generates the speech spectrum coefficients from the coding parameters indicating the far-end talker background noise information, and the excitation signal generator 2 continuously generates the excitation signal from the speech code sequence received in the final receiving period of the speech burst.
When the speech pause begins because of the transition from the speech burst to speech pause, since the speech coder transmits the unique word called a post-amble POST as described above, the speech decoding unit can detect the start of the speech pause by detecting the unique word (see, FIG. 2).
When the speech pause is detected, the synthesis filter 4 reproduces the speech signal from the excitation signal generated by the excitation signal generator 2 and from the far-end talker background noise information (speech spectrum coefficients) generated by the speech spectrum coefficient generator 3. However, if there is an acute difference between the far-end talker background noise information and the speech code sequence received in the final receiving period of the preceding speech burst, the reproduced speech signal varies sharply, thereby presenting a problem of reproducing uncomfortable background noise to the near-end listener.
In view of this, when the speech pause is detected, the speech spectrum coefficient interpolator 6 carries out linear interpolation of the speech spectrum coefficients (see, ☆ mark of FIG. 2), that is, the far-end talker background noise information received after the post-amble POST as shown in FIG. 2.
More specifically, if the synthesis filter 4 reproduces the speech signal using the far-end talker background noise information from the very beginning of the speech pause, the speech signal can change abruptly at the transition from the speech burst to the speech pause. Thus, to gradually vary the speech signal from the beginning of the speech pause to the update of the far-end talker background noise information (at the time when the next far-end talker background noise information is transmitted), a constant is added stepwise to the speech code sequence received in the final receiving period of the speech burst (the speech spectrum coefficients held in the speech spectrum coefficient buffer 5) to update the speech code sequence at fixed interpolation intervals (linearly increasing or decreasing the speech code sequence).
Using the far-end talker background noise information (speech spectrum coefficients) passing through the linear interpolation, the synthesis filter 4 reproduces the speech signal so that the speech output circuit 7 supplies the speech signal to the output terminal 8.
With the foregoing arrangement, the conventional speech decoding unit linearly interpolates the background noise information when the speech pause is detected, so as to vary the speech signal gradually. However, since the interpolation interval of the far-end talker background noise information is fixed at every frame interval, this presents a problem in that a near-end listener feels variations in the reproduced background noise to be monotonous and uncomfortable.
The present invention is implemented to solve the foregoing problem. Therefore, an object of the present invention is to provide a speech decoding unit and a speech decoding method capable of reproducing background noise with little uncomfortable feeling to the near-end listener.
The speech decoding unit in accordance with the present invention estimates coding parameters of a speech pause by carrying out a smoothing algorithm using coding parameters constituting far-end talker background noise information extracted by an extracting means and coding parameters that are used for synthesizing previous background noise.
This offers an advantage of being able to reproduce background noise with little uncomfortable feeling.
The speech decoding unit in accordance with the present invention can comprise an estimating means for estimating the coding parameters of the speech pause by substituting, into a prescribed equation, the coding parameters that are the far-end talker background noise information and the coding parameters that are used for synthesizing the previous background noise.
This offers an advantage of being able to carry out the smoothing algorithm of the coding parameters quickly without using a complicated configuration.
The speech decoding unit in accordance with the present invention can comprise a synthesizing means for synthesizing, in the initial receiving period of the speech pause, speech from coding parameters extracted from the final receiving period of the speech burst.
This offers an advantage of being able to eliminate a problem in that the background noise sharply changes in the initial receiving period of the speech pause.
The speech decoding unit in accordance with the present invention can carry out the smoothing algorithm of spectrum envelope information constituting a part of the coding parameters.
This offers an advantage of being able to reduce the arithmetic amount when there are coding parameters unnecessary for the smoothing algorithm.
The speech decoding unit in accordance with the present invention can carry out the smoothing algorithm of frame energy information constituting a part of the coding parameters.
This offers an advantage of being able to eliminate a problem in that the synthesized speech power of the background noise changes intermittently in response to the frame energy of the far-end talker background noise.
The speech decoding unit in accordance with the present invention can carry out the smoothing algorithm of spectrum envelope information and frame energy information constituting a part of the coding parameters.
This offers an advantage of being able to reproduce background noise with less uncomfortable feeling to the near-end listener.
The speech decoding unit in accordance with the present invention can comprise an estimating means for determining a smoothing coefficient of the coding parameters in response to variations between coding parameters extracted by the extracting means in the final receiving period of the speech burst and the coding parameters constituting the far-end talker background noise information extracted by the extracting means in a receiving period of the speech pause.
This offers an advantage of being able to reproduce background noise with less uncomfortable feeling because more appropriate smoothing coefficient of the coding parameters is obtained.
The speech decoding unit in accordance with the present invention can determine a smoothing coefficient of the coding parameters in response to variations between spectrum envelope information extracted in the final receiving period of the speech burst and the spectrum envelope information constituting the far-end talker background noise information, or in response to variations between the frame energy information extracted in the final receiving period of the speech burst and the frame energy information constituting the far-end talker background noise information.
This offers an advantage of being able to reproduce the background noise with little uncomfortable feeling without imposing a large load on the decision processing of the smoothing coefficient.
The speech decoding unit in accordance with the present invention can determine a smoothing coefficient of the spectrum envelope information in response to variations between the spectrum envelope information extracted in the final receiving period of the speech burst and the spectrum envelope information constituting the far-end talker background noise information, and determine a smoothing coefficient of the frame energy information in response to variations between frame energy information extracted in a final receiving period of the speech burst and the frame energy information constituting the far-end talker background noise information.
This offers an advantage of being able to reproduce background noise with less uncomfortable feeling to the near-end listener because the smoothing coefficient is determined in higher accuracy.
The speech decoding method in accordance with the present invention detects a speech pause by supervising a speech code sequence; and estimates, when the speech pause is detected, coding parameters of the speech pause by carrying out a smoothing algorithm of coding parameters by using coding parameters constituting the far-end talker background noise information extracted from the speech coding sequence and coding parameters used for synthesizing previous background noise.
This offers an advantage of being able to reproduce background noise with little uncomfortable feeling to the near-end listener.
The speech decoding method in accordance with the present invention can estimate the coding parameters of the speech pause by substituting, into a prescribed equation, the coding parameters constituting the far-end talker background noise information and the coding parameters used for synthesizing the previous background noise.
This offers an advantage of being able to carry out the smoothing algorithm of the coding parameters quickly without using a complicated configuration.
The speech decoding method in accordance with the present invention can synthesize, in the initial receiving period of the speech pause, speech from coding parameters extracted from the final receiving period of the speech burst.
This offers an advantage of being able to eliminate a problem in that the reproduced or synthesized background noise sharply changes in the initial receiving period of the speech pause.
The speech decoding method in accordance with the present invention can determine a smoothing coefficient of the coding parameters in response to variations between coding parameters extracted in the final receiving period of the speech burst and the coding parameters constituting far-end talker background noise information extracted in a receiving period of the speech pause.
This offers an advantage of being able to reproduce background noise with less uncomfortable feeling to the near-end listener because more appropriate smoothing coefficient of the coding parameters is obtained.