1. Field of the Invention
The present invention relates to a voice recognition apparatus.
2. Description of the Related Art
On-vehicle navigation apparatuses have been put to practical use, which detect the current position of a vehicle using a GPS (Global Positioning System) satellite and display the detected current position together with a map including that position on a display for guidance to a desired destination.
Further, on-vehicle navigation apparatuses equipped with a vocal-manipulation function, which can allow a user to execute various operations based on voices uttered by the user, have appeared today. Using the vocal-manipulation function, the user needs only to utter a phrase for a vocal-manipulation (hereinafter called xe2x80x9cvocal-manipulation phrasexe2x80x9d), such as xe2x80x9cZoom up the mapxe2x80x9d, to execute a process according to the manipulation. Such an on-vehicle navigation apparatus is equipped with a voice recognition apparatus to manage the vocal-manipulation function.
The voice recognition apparatus first recognizes a vocal-manipulation phrase uttered by a user-on the basis of the waveform of the voice of the user that is acquired through a microphone, and generates an operation code indicating an operation item corresponding to the vocal-manipulation phrase. The on-vehicle navigation apparatus executes an operation indicated by the operation code. When the user utters xe2x80x9cZoom up the mapxe2x80x9d, for example, the voice recognition apparatus recognizes, based on the voice waveform corresponding to the uttered phrase, that the phrase is a vocal-manipulation phrase which requests an operation to magnify the map and generates an operation code to zoom up the map. In accordance with the operation code, the on-vehicle navigation apparatus executes an operation (which will not be elaborated) to zoom up the map shown on the display.
However, there are various kinds of noise, such as driving noise and environmental noise, in a vehicle during driving. The noise itself may be recognized as a part of a voice uttered by a user. This makes it hard for the voice recognition apparatus to accurately recognize a vocal-manipulation phrase uttered by the user. Such erroneous voice recognition leads to an erroneous operation which is unintended by the user.
The present invention was accomplished with a view to solving the problems described above, and it is an object of the invention to provide a voice recognition apparatus which can prevent an erroneous manipulation from being carried out due to erroneous voice recognition even under a noise environment.
According to one aspect of the invention, there is provided a voice recognition apparatus for recognizing voice uttered by an operator, comprising a portion for performing a voice recognition process on a voice signal corresponding to the voice to thereby acquire vocal phrase data indicating the uttered phrase; a portion for detecting a point of time when the operator has started uttering the voice and a point of time when the operator has ended uttering the voice on the basis of a signal level of the voice signal to thereby generate first utterance duration information; a portion for capturing a mouth of the operator to acquire mouth image data; a portion for detecting a point of time when the operator has started uttering the voice and a point of time when the operator has ended uttering the voice on the basis of the mouth image data to thereby generate second utterance duration information; and an output portion for outputting the vocal phrase data as long as the first utterance duration information is approximate to the second utterance duration information.
According to another aspect of the invention, there is provided a voice recognition apparatus for recognizing voice uttered by a operator and acquiring vocal phrase data representing a phrase indicated by the voice, comprising: a portion for performing a voice recognition process on a voice signal corresponding to the voice to thereby acquire a plurality of vocal phrase data candidates; a portion for detecting a point of time when the operator has started uttering the voice and a point of time when the operator has ended uttering the voice on the basis of a signal level of the voice signal to thereby generate first utterance duration information; a portion for capturing a mouth of the operator to acquire mouth image data; a portion for detecting a point of time when the operator has started uttering the voice and a point of time when the operator has ended uttering the voice on the basis of the mouth image data to thereby generate second utterance duration information; a portion for counting the number of changes in a shape of the mouth in a duration of utterance indicated by the second utterance duration information on the basis of the mouth image data to thereby generate number-of-mouth-shape-change information; and a portion for selecting that one of the vocal phrase data candidates which has a count of changes in the mouth equal to the count indicated by the number-of-mouth-shape-changes information and outputting the selected vocal phrase data candidate as the vocal phrase data, as long as the first utterance duration information is approximate to the second utterance duration information.