1. Field of the Invention
The present invention relates to a voice packet communication or a voice storing and processing, which extracts speech spurts from a voice signal, and reproduces the voice signal from the extracted speech spurts.
2. Description of the Related Art
A technique that extracts speech spurts from a voice signal has been widely employed by many apparatuses and systems because of its advantage of being able to make efficient use of communication network facilities or voice storing facilities owing to its effective use of information to be transmitted or stored.
It is important for this technique to reproduce a voice signal resembling natural speech as much as possible. Speech spurt detection in a background noise environment like an air conditioned one, for example, will cause the receiving side to reproduce, during the speech spurts, the background noise along with the significant speech. The background noise, however, is not reproduced during pauses in which no significant speech is present, which results in unnatural feeling as if the speech was clipped although it is intelligible. In particular, a long pause will mislead the party into thinking that the call has been hung up.
To solve this problem, the following methods are applied to alleviate the unnaturalness.
(1) The transmission side observes the signal level of the background noise, and the receiving side inserts the noise matching the observed signal level during the pauses. PA1 (2) The voice signal during intervals decided as pauses is reproduced in hangover periods. Here, the hangover period refers to a short period following the transition from a speech spurt to a pause. PA1 (3) The transmission side transfers the noise level to the receiving side, and the receiving side reproduces the noise of that level during the pauses. PA1 extracting speech spurts consisting of significant speech in a voice signal; PA1 extracting speech during hangover periods defined as a particular period immediately following transitions of the speech spurts to pauses; PA1 measuring incoming external noise levels during the pauses; and PA1 producing an extracted voice signal consisting of the extracted speech spurts and extracted speech during the hangover periods, producing measured results of the external noise levels, and producing information for identifying the speech spurts, hangover periods and pauses, and PA1 at a speech reproduction side: PA1 extracting speech spurts consisting of significant speech in a voice signal; PA1 extracting speech during hangover periods defined as a particular period immediately following transitions of the speech spurts to pauses; PA1 measuring incoming external noise levels during the pauses; and PA1 producing an extracted voice signal consisting of the extracted speech spurts and extracted speech during the hangover periods, producing measured results of the external noise levels, and producing information for identifying the speech spurts, hangover periods and pauses. PA1 generating a third signal from the external noise levels transmitted; PA1 adjusting levels of the extracted voice signal during the hangover periods; PA1 adjusting the third signal during the hangover periods; and PA1 producing during the speech spurts the extracted voice signal, producing during the hangover periods a mixture of the extracted voice signal and the third signal, which undergo adjustment, and producing in the pauses the third signal. PA1 voice level measuring means for detecting speech spurts consisting of significant speech in a voice signal, and for measuring incoming external noise levels during pauses; PA1 voice extracting means for extracting the speech spurts and speech during hangover periods defined as a particular period immediately following transitions of the speech spurts to the pauses; and PA1 output means for producing an extracted voice signal consisting of the extracted speech spurts and extracted speech during the hangover periods, for producing measured results of the external noise levels, and for producing information for identifying the speech spurts, hangover periods and pauses. PA1 a signal generator for generating a third signal in response to the external noise levels transmitted; PA1 voice level adjuster for adjusting levels of the extracted voice signal during the hangover periods; PA1 a third signal level adjuster for adjusting the third signal during the hangover periods; PA1 a mixer for mixing the voice signal and the third signal, which undergo the level adjustments; and PA1 a combiner for producing during the speech spurts the extracted voice signal, for producing during the hangover periods a mixture of the extracted voice signal and the third signal, which undergo the level adjustments, and for producing in the pauses the third signal. PA1 (1) the transmitting side generates, when transmitting the voice signal, information that enables the receiving side to identify the speech spurts and hangover periods; and PA1 (2) the receiving side controls, when reproducing the voice signal during the speech spurts, hangover periods and pauses, the mixing ratio between the received voice signal and the third signal the receiving side generates.
It is known that the technique (2) is particularly effective.
Although the techniques (1) and (3) can reduce the unnaturalness to some extent, the noise inserted into the pauses differs in general from the background noise because it changes depending on the environment of the transmitting side. As a result, in some cases, they cannot fully relieve the unnaturalness because of perceptible changes in sound quality at the transitions between the speech spurts and pauses in the reproduced voice signal.