The present invention relates to a speech determination apparatus and a speech determination method for detecting speech segments in an input signal.
A signal generated by capturing voices carries speech segments that involve the voices and non-speech segments that are pauses or breath with no voices. A speech (or voice) recognition system determines speech and non-speech segments for higher speech recognition rate and higher speech-recognition process efficiency. Mobile communication using mobile phones, transceivers, etc. switches the encoding process for input signals between speech and non-speech segments for higher coded rate and transfer efficiency. The mobile communication requires a real-time performance, hence demanding less delay in a speech-segment determination process.
A known speech-segment determination process with less delay detects speech segments, with the comparison between the flatness of a frequency distribution of a frame of an input signal and a threshold level. Another known speech-segment determination process with less delay detects speech segments, with cepstrum analysis to: derive harmonic data on a fundamental wave that involves the maximum number of harmonic overtone components from a frame of an input signal and; analyze the harmonic data and power data on energy in the frame (the power data indicating an energy level with respect to a threshold level) whether the harmonic and power data exhibit the feature of voices.
The known speech-segment determination processes are effective in an environment where noises are relatively small. However, the known processes tend to erroneously detect speech segments when noises become larger due to the fact the feature of voices is embedded in the noises. The feature of voices is, for example, the flatness of a frequency distribution (indicating how often peaks appear) of a frame of an input signal and the pitch (high tones).
Moreover, the cepstrum analysis requires to perform Fourier transform two times with a heavy processing load in the frequency domain, thus consuming much power. Thus, if the cepstrum analysis is employed in a battery-powered system such as mobile communication equipment, a higher-capacity battery is required for much power consumption, resulting in a higher cost, a bulkier system, etc.