1. Technical Field
The present disclosure relates to speech processing and more specifically to using conversation context to determine which portions of continuously monitored speech are relevant.
2. Introduction
One of the challenges for speech-driven systems is to identify when the user's input is directed to the system as opposed to some other person in the vicinity of the system. Typically, a “push-to-talk” button on the user interface activates the microphone only when the user intends an input to the system. Similar approaches also rely on inputs that are functionally similar to a button press, such as pressing a touch-sensitive screen, uttering a key phrase, or some other explicit signal or event indicating that the user intends to direct speech input to the system. Upon receiving such input, the system activates the microphone or other speech input device and begins receiving speech.
However, this approach limits the functionality of such human-machine interaction systems to wait for the entire user's input before acting on the input speech. Also, in machine-mediated human-human conversations, users have to take turns “switching on” and “switching off” the microphone manually, which leads to a tedious and cumbersome conversation. Further, users may forget to manually activate the microphone, leading to frustration, confusion, and lost time. These difficulties hinder the widespread adoption and use of speech interfaces.