Particular embodiments generally relate to speech recognition.
Speech recognition attempts to make information access easier and simpler through verbal queries and commands. These queries have historically been activated by button presses on a device, such as a smart phone. Using verbal queries allows users to make queries without typing in the query. This makes information access easier when users are busy, such as when users are in cars or simply would not like to type in the queries. After the button press is received, a speech recognizer listens to the query and attempts to respond appropriately. Even though using the button press is easier, sometimes having a user press a button to activate the speech recognizer is inconvenient for a user. For example, the user may be occupied with other activities where using his/her hands to perform the button press may not be possible, such as a user may be driving a car.
Other approaches replace button presses with hands-free approaches that activate the speech recognizer using activation words. For example, trigger phrases are used to activate the speech recognizer, which can then decipher a query and provide an appropriate response after the trigger phrase is received. However, the user must always trigger the speech recognizer. Additionally, since the user has triggered the recognizer, errors in the recognition or responses are typically not tolerated by the user.
In all these approaches, a user is deciding when to issue a query or command. The speech recognizer is affirmatively activated and then a response is expected by the user. Because the user is expecting a response, errors in speech recognition may not be tolerated. Also, because the speech recognizer is only listening for content after activation, certain contexts and important points in a conversation will be missed by the speech recognizer.
Additionally, even when a response is output to a user, the response is a generic response. For example, a speech recognizer may perform a web search using keywords that were recognized. This keyword search would be output to any user that is speaking.