The voice activity segmentation technology is used in order to improve speech transmission efficiency through removing or compressing a voice-inactive segment in which a speaker does not speak in mobile communication or the like.
Moreover, the voice activity segmentation technology is used by a noise canceller, an echo canceller or the like in order to estimate noise in the voice-inactive segment.
Moreover, the voice activity segmentation technology is used widely by a speech recognition system in order to improve performance and to reduce an amount of processing.
A general voice activity segmentation system calculates a feature value of a time-series of inputted sound per a unit time, and determines a voice-active segment and a voice-inactive segment in the time-series of the inputted sound by comparing the feature value with a threshold value.
The feature value which is used in the voice activity segmentation will be exemplified in the following. For example, a patent document 1 discloses that, after smoothing fluctuation of power spectrum, the smoothed power spectrum is used as the feature value.
A non-patent document 1 discloses that an average value of SNR shown in section 4.3.3 and SNR shown in section 4.3.5 is used as the feature value.
In addition to the feature values mentioned above, many feature values are used. For example, number of zero crossing points shown in section B.3.1.4 of a non-patent document 2, a likelihood ratio by use of speech GMM (Gaussian Mixture Model) and sound-free GMM shown in a non-patent document 3, a combination of plural feature values shown in the patent document 2 or the like is exemplified.
A patent document 2 discloses a method of urging a user to utter a reference speech, carrying out compulsory alignment to the utterance, determining a voice-active segment and a voice-inactive segment, and updating weights, which are assigned to a plurality of the feature values, so that determination error on the voice-active segment and the voice-inactive segment may be minimum.