As a conventional voice judging system of this sort, there is disclosed in Patent Publication 1, for example, a configuration herein shown in FIG. 8. Referring to FIG. 8, this conventional voice judging system includes sound signal input means 810, feature value extracting means 821, sound score calculating means 824, likelihood ratio calculating means 822, voice judging means 823, a voice model memory 831 and a non-voice model memory 832. The operation of the conventional voice judging system, shown in FIG. 8, will be described briefly.
From the sound signal input by the sound signal input means 810, feature is analyzed by the feature value extracting means 821. As features, cepstrum based on LPC (Linear Predictive Coefficient) analysis and its first-order differential with respect to time, are used. The cepstrum is a feature representing the property of the sound signal in the frequency domain, that is, the shape of the log spectral envelope. The analysis of the feature is performed for a blocked frame which is blocked every 32 msec, for instance.
The sound score calculating means 824 calculates, for the feature derived by the feature value extracting means 821,                likelihood for a voice model stored in the voice model memory 831, and        likelihood for a non-voice model stored in the non-voice model memory 832.        
Meanwhile, the voice model and the non-voice model are trained in advance using a voice signal and a non-voice signal, respectively. As these models, may be used, for example, HMMs (Hidden Markov Models).
The likelihood ratio calculating means 822 calculates the ratio of the likelihood of voice model to that of non-voice model which have been acquired by the sound score calculating means 824.
When an interval, for which the likelihood ratio calculated by the likelihood ratio calculating means 822, exceeds a preset threshold value, continues for a preset time, the voice judging means 823 determines the interval to be a voiced interval.
With this conventional voice judging system, voice and non-voice models, in which the cepstrum, representing the property in the frequency domain of a sound signal, is used as a feature, are provided, and the likelihood of the voice model is compared with that of the non-voice model for each frame. Thus, the system enables voice judgment which is robust to some extent against noise.
There is shown in Patent Publication 2 the configuration of a voice decoding device for distinguishing a stationary signal exhibiting the periodicity from a white-noise-like steady noise signal to enable an interval of the stationary noise signal to be detected accurately. This system analyzes the period of the voice signal in a sub-frame and decides a signal exhibiting strong periodicity to be not a stationary noise interval because such signal is highly likely to be a stationary vowel, other than the noise, for instance. A pitch log analyzer shown in this Patent Publication 2 analyzes the variations in the pitch period input from an adaptive codebook, on each sub-frame, and detects vowel-likeness of the signal, in order to determine whether or not the signal is a voice signal. That is, in Patent Publication 2, the period of the voice signal in a sub-frame corresponds to the period (3 to 10 msec) of a voice waveform for a vowel. In a configuration disclosed in Patent Publication 3, sound parameters, such as 18 order LPC cepstrum, number of zero-crossings or power, are extracted from voice data, and the vowels are detected based on the sound parameters and the standard vowel patterns stored in a standard vowel pattern storage unit. Hypotheses are generated, as the dictionary grammar storage unit is searched in the order of vowels and consonants, based on the results of recognition. The scores (likelihood) of consonants are derived from one hypothesis to another, and characters are selected based on the scores to generate character string data.    Patent Document 1: JP Patent Kokai JP-A-10-254476    Patent Document 2: JP Patent Kokai JP-A-2002-236495    Patent Document 3: JP Patent Kokai JP-A-06-266387    Non-Patent Document 1: S. Furui, ‘Digital Speech Processing’, published by TOKAI UNIVERSITY Publishing Section, 1985, p. 40    Non-Patent Document 2: N. Takaya, ‘Digital Signal Processing’, published by SHOKOH-DO, 1997, pp. 96-99