In general, a speech recognition device used in a vehicle-mounted device such as a car navigation device detects a speech section, and then recognizes a word sequence on the basis of the feature of the speech calculated for the detected speech section. When the detection of a speech section is erroneous, the rate of speech recognition in the section is degraded. Thus, such a speech recognition device is intended to exactly detect a speech section. Further, the speech recognition device detects a non-speech section and then excludes it from the target of speech recognition.
In an example of a basic method of detecting a speech section, a section in which the power of speech input exceeds a criterion value obtained by adding a threshold value to the estimated present background noise level is treated as a speech section. In this approach, a section containing noise having strong non-stationarity (e.g., noise sound having large power fluctuation such as buzzer sound; the sound of wiper sliding; and the echo of speech prompt) is erroneously detected as a speech section in many cases. A technique that a correction coefficient is calculated from the maximum speech power of the latest utterance and the speech recognition result at that time and then used together with the estimated background noise level so as to correct the future criterion value is disclosed in Japanese Patent Application Laid-Open No. H7-92989.