1. Field of the Invention
The present invention relates to an apparatus, a method, and a computer program product for judging whether an acoustic signal represents speech or non-speech.
2. Description of the Related Art
In a speech/non-speech judging process performed on an acoustic signal, a characteristic amount is extracted from each of the frames in the input acoustic signal (i.e., an input signal), and a threshold value process is performed on the obtained characteristic amounts, so that it is possible to judge whether each of the frames represents speech or non-speech. J. L. Shen, J. W. Hung, and L. S. Lee, “Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments” in the proceedings of the International Conference on Spoken Language Processing (ICSLP)-98, 1998 has proposed using a spectral entropy value as an acoustic characteristic amount during a speech/non-speech judging process. The characteristic amount is expressed by an entropy value obtained through a calculation in which a spectrum calculated based on an input signal is assumed to be a probability distribution. The value of the spectral entropy is small for a speech spectrum, which has an uneven spectral distribution, whereas the value of the spectral entropy is large for a noise spectrum, which has an even spectral distribution. When the method that employs the spectral entropy value is used, whether each of the frames represents speech or non-speech is judged based on these characteristics.
P. Renevey and A. Drygajlo, “Entropy Based Voice Activity Detection in Very Noisy Conditions” in the proceedings of EUROSPEECH 2001, pp. 1887-1890, September 2001 has proposed a normalization method for improving the efficacy of spectral entropy. According to P. Renevey et al., an input spectrum is normalized by using an estimated noise spectrum. More specifically, in the normalizing process according to P. Renevey et al., the spectrum of the input signal is divided by the spectrum of the background noise so that the value of the spectral entropy in a noise period becomes larger. With this arrangement, it is possible to whiten the spectrum in the noise period and to make the spectral entropy value larger even for uneven background noise such as noise from passing vehicles, which has the energy concentrated in the lower range. It is confirmed that the normalized spectral entropy has high efficacy on stationary noise such as noise from passing vehicles.
However, the normalization of the spectral entropy as described above does not sufficiently normalize, for example, babble noise of which the spectrum changes in a non-stationary manner. As a result, a problem arises where the normalized spectral entropy in the noise period has a small value like that of a speech signal. Because of this problem, when only the normalized spectral entropy is used, it is not possible to achieve high enough efficacy for non-stationary noise.