Speech recognition applications have become increasingly popular with computer users. Speech recognition allows a user to talk into a microphone connected to the computer, and the computer translating the speech into recognizable text or commands understandable to the computer. There are several different types of uses for such speech recognition. In one type, speech recognition is used as an input mechanism for the user to input text into a program, such as a word processing program, in lieu of or in conjunction with a keyboard. In another type, speech recognition is used as a mechanism to convey commands to a program—for example to save a file in a program, instead of selecting a save command from a menu using a mouse.
In yet another type of use for speech recognition, speech recognition is used in conjunction with an on-screen agent or automated assistant. For example, the agent may ask the user whether he or she wishes to schedule an appointment in a calendar based on an electronic mail the user is reading—e.g., using a text-to-speech application to render audible the question through a speaker, or by displaying text near the agent such that it appears that the agent is talking to the user. Speech recognition can then be used to indicate the user's acceptance or declination of the agent's offer.
In these and other types of uses for speech recognition, an issue lies as to when to turn on the speech recognition engine—that is, as to when the computer should listen to the microphone for user speech. This is because in part speech recognition is a processor-intensive application; keeping speech recognition turned on all the time may slow down other applications being run on the computer. In addition, keeping speech recognition turned on all the time may not be desirable, in that the user may accidentally say something into the microphone that was not meant for the computer.
One solution to this problem is generally referred to as “push-to-talk.” In push-to-talk systems, a user presses a button on an input device such as a mouse, or presses a key or a key combination on the keyboard, to indicate to the user that it is ready to speak into the microphone such that the computer should listen to the speech. The user may optionally then be required to push another button to stop the computer from listening, or the computer may determine when to stop listening based on no more speech being spoken by the user.
Push-to-talk systems are disadvantageous, however. A goal in speech recognition systems is to provide for a more natural manner by which a user communicates with a computer. However, requiring a user to push a button prior to speaking to the computer cuts against this goal, so it is unnatural for the user to do so. Furthermore, in applications where a dialog is to be maintained with the computer—for example, where an agent asks a question, the user answers, and the agent asks another question, etc.—requiring the user to push a button is inconvenient and unintuitive, in addition to being unnatural.
Other prior art systems include those that give the user an explicit, unnatural message to indicate that the system is listening. For example, in the context of automated phone applications, a user may be hear a recorded voice “Press 1 now for choice A.” While this may improve on push-to-talk systems, it nevertheless is unnatural. That is, in everyday conversation between people, such explicit messages to indicate that one party is ready to listen to the other is rarely heard.
For these and other reasons, there is a need for the present invention.