Field of the Invention
The present invention relates to a speech communications system for a vehicle and to a method of operating a speech communications system for a vehicle.
Car manufacturers increasingly provide cars with in-car electronic systems that have speech recognition and text-to-speech functions. Such in-car electronic systems are for example navigation systems, stereo systems, and telephone systems that can recognize speech that is spoken by a driver. In case of a navigation system, these speech recognition and text-to-speech capabilities allow a driver to ask for directions to a specific street address and to receive voice guidance on how to reach the desired street address. In case of a stereo system, the driver may for example ask the stereo system to play a favorite song or a favorite radio station.
Typical conventional in-car speech recognition systems require the driver or user to press a so-called push-to-talk button in order to start a voice interaction. After the push-to-talk button has been pressed, the in-car speech recognition system provides an audio signal, usually a “beep” sound provided via the car speakers, in order to let the driver know that the microphone for the speech recognition is now open and that the driver can now talk. When the speech recognition system provides a speech response back to the driver, the driver generally has to wait until the speech recognition system is finished with its speech response before the driver can say anything to the speech recognition system. Alternatively, the driver has to wait until the speech recognition indicates with a “beep” sound that the microphone is open and that the driver can talk to the speech recognition system. If the driver wants to interrupt the reply coming from the speech recognition system, then the driver will have to press the push-to-talk button and may have to wait for the “beep” before the driver can say anything.
The above-described interaction between the driver and the speech recognition system is significantly different from a normal human-to-human interaction due to the requirement of having to press a push-to-talk button or having to wait for an audio signal such as a “beep” sound before the driver can speak to the speech recognition system. In a human-to-human voice interaction, the start of the interaction can be naturally initiated by voice from anyone participating in the interaction. Also, in a human-to-human voice interaction, the interaction can be interrupted naturally by voice and anyone who is involved in the interaction can interrupt the other side. Since human beings are used to human-to-human interactions, they encounter problems when they have to use a push-to-talk button or have to wait for a “beep” sound before they can speak.
A typical problem that drivers have when they interact with a speech recognition system is that the drivers forget to press the push-to-talk button before they talk. Especially new users who are not yet familiar with the speech recognition system do not remember to press the push-to-talk button before talking to the speech recognition system. Thus new users often talk to the speech recognition system, but the speech recognition system is not listening to the user.
A further typical problem that drivers have when using the above-described speech recognition system is that drivers find it difficult to time their speech utterance to the audio signal, i.e. the beep sound, that indicates that the speech recognition system is listening. If the driver is in a hurry when pressing the push-to-talk button, the driver tends to speak before or during the beep sound. The speech recognition system plays the beep sound in the middle of the driver's speech utterance or after the driver's speech utterance. As a result, the speech recognition system listens only to either a portion of the driver's speech utterance or does not listen at all to what the driver said.
Another problem of the above-described conventional speech recognition system is that a reply coming from the speech recognition system cannot be interrupted by a voice utterance from the user. This is a disadvantage when the driver is familiar with the replies of the speech recognition system and the driver already knows what he/she has to say before the speech recognition system finishes talking. In such a case, the driver tends to reply before the speech recognition system finishes talking. Since the above-described conventional speech recognition system is not listening to the driver during the speech output, the driver will have to repeat his/her reply. The driver will have to either press the push-to-talk button and wait for the beep sound or wait until the speech recognition system is finished with its speech output and wait for the beep sound that indicates that the speech recognition system is now listening.
Some conventional speech recognition systems have tried to solve the above-described problems in part by not requiring the use of the push-to-talk button at every step of the interaction between the driver and the speech recognition system. For example, the driver has to press the push-to-talk button at the beginning of the interaction in order to start the dialog with the speech recognition system. During the interaction, the driver will normally talk to the speech recognition system only after the speech recognition system plays an audio signal, i.e. a beep sound, indicating that the speech recognition system is now listening. In case the driver does not want to wait for the beep sound and wants to interrupt the speech output of the speech recognition system, the driver has to press the push-to-talk button. Such a speech recognition system does not require the driver to press the push-to-talk button at every interaction step; however, the driver still has to wait until the speech recognition system finishes the speech output.
A disadvantage of such as speech recognition system is that the driver may get confused because some interaction steps, such as starting and interrupting the interaction, require the use of the push-to-talk button whereas other interaction steps do not require the use of the push-to-talk button. Another disadvantage is that in case the driver cannot respond to a question of the speech recognition due to an unexpected driving situation that requires the full attention of the driver, the speech recognition system may repeatedly prompt the driver for a response and thus distract the driver.
Other conventional speech recognition systems for vehicles operate like the conventional speech recognition system described above, except they do not provide an audio feedback to the user, i.e. they do not provide a beep sound indicating that the speech recognition system is listening. With these speech recognition systems, the driver still needs to press the push-to-talk button to be able to start the interaction. When the speech recognition system is speaking to the driver, the driver will still have to wait until the speech recognition system finishes talking or the driver will have to press the push-to-talk button in order to interrupt the speech output of the speech recognition system.
There are also conventional speech recognition systems that constantly record what the driver is saying. As soon as the driver presses the push-to-talk button, the speech recognition system sends all the voice information spoken by the driver after the driver pressed the push-to-talk button and additionally the recorded voice information of a few fractions of a second before the driver pressed the push-to-talk button to the speech recognizer. By constantly recording the voice information of the driver, some of the problems related to the synchronization of the flow of information between the speech recognition system and the driver are alleviated. However, the driver still has to press the push-to-talk button to start the interaction with the speech recognition system and the driver also has to press the push-to-talk button in order to interrupt the speech output of the speech recognition system.
European Patent Application No. EP 1 562 180 A1 discloses a speech dialog system for controlling an electronic device. The speech recognition device includes a control command determining means to be activated by a keyword for determining a control command for controlling the electronic device. The speech dialog system therefore does not need a push-to-talk button. The speech dialog system of EP 1 562 180 A1 preferably includes noise suppression in order to filter out unwanted audio signals. Further speech recognition devices with the capability to detect keywords are also described in Patent Abstract of Japan No. 2001042891 A and Patent Abstract of Japan No. 2005157086 A. International Publication No. 2004/038697 A1 discloses a speech control unit including a microphone array for receiving audio signals and a keyword recognition system in order to be more selective for those parts of the audio signals which correspond to speech that is spoken by a given user.