The present invention relates to a consonant-segment detection apparatus and a consonant-segment detection method that detect a consonant segment carried by an input signal.
A human voice is classified into a vowel and a consonant or a voiced sound and an unvoiced sound, etc. There are techniques to detect or recognize a human voice using each feature of the voiced and unvoiced sounds, etc.
There are techniques to distinguish between voiced and unvoiced sounds based on zero-crossing detection with counting of the number of times of the change between the positive and negative for frames of an input signal, followed by comparison of the number of times of the change between frames.
When detecting a voice included in an input signal, it is relatively easy to detect a vowel segment in an environment at a relatively high noise level because of higher energy of a vowel than a consonant, whereas it is difficult to detect a consonant segment in such an environment because of lower energy of a consonant so that the feature of the consonant is covered by noises.
In such an environment at a relatively high noise level, the known zero-crossing detection may not always a good scheme for detecting a consonant segment when there is almost no zero crossing due to the change in sound level at sampling points if there is much noise in low frequency bands.