This invention relates generally to conversational dialog between a computer or other processor-based device and a user, and more particularly to such dialog without requiring push-to-talk functionality.
Speech recognition applications have become increasingly popular with computer users. Speech recognition allows a user to talk into a microphone connected to the computer, and the computer translating the speech into recognizable text or commands understandable to the computer. There are several different types of uses for such speech recognition. In one type, speech recognition is used as an input mechanism for the user to input text into a program, such as a word processing program, in lieu of or in conjunction with a keyboard. In another type, speech recognition is used as a mechanism to convey commands to a programxe2x80x94for example to save a file in a program, instead of selecting a save command from a menu using a mouse.
In yet another type of use for speech recognition, speech recognition is used in conjunction with an on-screen agent or automated assistant. For example, the agent may ask the user whether he or she wishes to schedule an appointment in a calendar based on an electronic mail the user is readingxe2x80x94e.g., using a text-to-speech application to render audible the question through a speaker, or by displaying text near the agent such that it appears that the agent is talking to the user. Speech recognition can then be used to indicate the user""s acceptance or declination of the agent""s offer.
In these and other types of uses for speech recognition, an issue lies as to when to turn on the speech recognition enginexe2x80x94that is, as to when the computer should listen to the microphone for user speech. This is because in part speech recognition is a processor-intensive application; keeping speech recognition turned on all the time may slow down other applications being run on the computer. In addition, keeping speech recognition turned on all the time may not be desirable, in that the user may accidentally say something into the microphone that was not meant for the computer.
One solution to this problem is generally referred to as xe2x80x9cpush-to-talk.xe2x80x9d In push-to-talk systems, a user presses a button on an input device such as a mouse, or presses a key or a key combination on the keyboard, to indicate to the user that it is ready to speak into the microphone such that the computer should listen to the speech. The user may optionally then be required to push another button to stop the computer from listening, or the computer may determine when to stop listening based on no more speech being spoken by the user.
Push-to-talk systems are disadvantageous, however. A goal in speech recognition systems is to provide for a more natural manner by which a user communicates with a computer. However, requiring a user to push a button prior to speaking to the computer cuts against this goal, so it is unnatural for the user to do so. Furthermore, in applications where a dialog is to be maintained with the computerxe2x80x94for example, where an agent asks a question, the user answers, and the agent asks another question, etc.xe2x80x94requiring the user to push a button is inconvenient and unintuitive, in addition to being unnatural.
Other prior art systems include those that give the user an explicit, unnatural message to indicate that the system is listening. For example, in the context of automated phone applications, a user may be hear a recorded voice xe2x80x9cPress 1 now for choice A.xe2x80x9d While this may improve on push-to-talk systems, it nevertheless is unnatural. That is in everyday conversation between people, such explicit messages to indicate that one party is ready to listen to the other is rarely heard.
For these and other reasons, there is a need for the present invention.
The invention relates to conversational dialog with a computer or other processor-based device without requiring push-to-talk functionality. In one embodiment, a computer-implemented method first determines that a user desires to engage in a dialog. Next, based thereon the method turns on a speech recognition functionality for a period of time referred to as a listening horizon. Upon the listening horizon expiring, the method turns off the speech recognition functionality.
In specific embodiments, determining that a user desires to engage in a dialog includes performing a probabilistic cost-benefit analysis to determine whether engaging in a dialog is the highest expected utility action of the user. This may include, for example, initially inferring a probability that the user desires an automated service with agent assistance. Thus, in one embodiment, the length of the listening horizon can be determined as a function of at least the inferred probability that the user desires automated service, as well as a function of the acute listening history of previous dialogs.
Embodiments of the invention provide for advantages not found within the prior art. Primarily, the invention does not require push-to-talk functionality for the user to engage in a dialog with the computer including engaging in a natural dialog about a failure to understand. This means that the dialog is more natural to the user, and also more convenient and intuitive to the user. Thus, in one embodiment, an agent may be displayed on the screen, ask the user a question using a text-to-speech mechanism, and then wait for the listening horizon for an appropriate response from the user. The user only has to talk after the agent asks the question, and does not have to undertake an unnatural action such as pushing a button on an input device or a key on the keyboard prior to answering the query.
The invention includes computer-implemented methods, machine-readable media, computerized systems, and computers of varying scopes. Other aspects, embodiments and advantages of the invention, beyond those described here, will become apparent by reading the detailed description and with reference to the drawings.