1. Field of the Invention
The present invention relates to voice recognition systems and methods for recognizing voice commands issued by users so as to control devices, and more particularly, to a voice recognition system having a talk-back function of feeding back the recognized voice to a user.
2. Description of the Related Art
The presently preferred embodiments relate to a voice recognition system that allows a user to input his/her voice to operate a device such as a navigation system, hands-free device, or personal computer mounted in a vehicle. Such a voice recognition system may be used in addition to or instead of a remote control, a touch panel, a keyboard, or a mouse.
In this type of voice recognition system, when a user presses a speech button provided for the system, the system enters a voice recognition mode, the user's input voice is recognized, and a voice command is executed. There are two approaches to inputting voice. In a first approach, when a user presses the speech button once, the system enters the voice recognition mode, and the system instructs the user to input his/her voice when necessary so that the user and the system interactively communicate with each other. In a second approach, every time the user presses the speech button, the user can input his/her voice only for a predetermined time period.
Most of the voice recognition systems have a talk-back function of feeding back the recognized voice to the user via a speaker. The user listens to the talk-back voice to check whether it has been correctly recognized. If the recognition result is wrong, the user inputs his/her voice once again, and if the recognition result is correct, the user supplies the corresponding information to the system. In response to the user's instruction, the voice recognition system performs various controls.
Normally, a plurality of voice commands used in the voice recognition system are divided into a plurality of levels according to the type of operation to be performed on a device to be controlled. For example, to specify a destination in a navigation system by inputting an address, the user inputs the address aloud by dividing it into a plurality of levels, such as “prefecture→city, town (or village)→the rest of the address”.
In this case, every time the user inputs his/her voice, the input voice for each level is spoken back, and thus, it takes time to finish the voice input of the complete address of the destination. To overcome this drawback, attempts have been made to reduce the voice recognition time. As one example, Japanese Unexamined Patent Application Publication No. 6-149287 discloses a system in which the voice recognition time is reduced by decreasing the computation amount of a talk-back voice.
In known voice recognition systems, however, while a talk-back voice is outputted, the next voice input is not accepted. If the talk-back voice is mixed with a voice input by a user, incorrect recognition of the input voice is likely to occur. FIG. 4A illustrates a timing chart of the voice input enable state in a known voice recognition system. In FIG. 4A, the above-described first approach is adopted to input the voice.
As shown in FIG. 4A, in the first approach, when the user first presses the speech button, the system enters the voice recognition state to receive the voice input for a predetermined time period. During this period, the user inputs desired voice commands. After the user inputs the voice, the voice recognition system recognizes the input voice and outputs a talk-back voice. During this period, voice input is not accepted. After the talk-back operation, the system once again enters the voice input enable state to enable the user to input his/her voice.
Accordingly, in this first approach, the user cannot input his/her voice while the talk-back operation is being performed. In other words, the user has to wait until the talk-back operation is finished, and thus, it takes time to finish voice input.
In the second approach, the user can press the speech button to interrupt the talk-back operation and continue to input his/her voice. In this case, however, when inputting the voice for a plurality of levels, the user has to press the speech button every time he/she inputs voice for each level, thereby making the operation very complicated.