1. Field of the Invention
The present invention relates to detection of a speaking period in a voice recognition processing performed in a noisy environment or where a lot of people speak at the same time.
2. Description of the Related Art
In usual voice detection devices, there has been adopted a voice recognition technique of handling a voice in speech as an acoustic signal and performing frequency analysis for the acoustic signal to recognize and process voice information. In order to provide a desirable voice detection result using the voice recognition technique, it is important to accurately recognize contents of speech from a detected voice signal as well as to accurately detect whether or not the speaker concerned is actually speaking (detection of a speaking period) Especially in the case of voice recognition performed in a noisy environment or where a lot of people are speaking at the same time, detection a speaking period is an important problem.
The reason is that, while a speaking period can be easily detected by observing power of a detected voice signal in an environment with few noises, it cannot be simply detected from the power because the detected voice signal has noises added thereto in such a noisy environment. If a speaking period cannot be detected, a voice cannot be recognized even if there is provided a voice recognition device robust against noises for subsequent processing.
Several researches have been made on detection of a speaking period. For example, “Handsfree Voice Recognition Using Microphone Array and Kalman Filter in An Actual Environment-Construction of Front-End System for Interactive TV” by Masakiyo Fujimoto and Yasuo Ariki; The Fourth DSPS Educators Conference; pp. 55-58; August, 2002, and “Robust Speech Detection Using Images of Portions Around Mouth” by Kazumasa Murai, Keisuke Noma, Ken-ichi Kumagai, Tomoko Matsui, and Satoshi Nakamura; Information Processing Society of Japan Research Report “Voice Language Information Processing” No. 034-01; March, 2000, are on such researches.
The approaches of the techniques described in these documents and other prior-art techniques are roughly classified into two: one is an approach attempting to detect a speaking period only from a voice signal, and the other is an approach attempting to detect a speaking signal not only from a voice signal but also from a non-voice signal.