The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
With advances in integrated circuits, computing, artificial intelligence, speech recognition, and other related technologies, spoken dialogue systems have become increasingly popular. Examples of spoken dialogue systems include, but are not limited to Siri from Apple Computer, Google Home from Google, Echo from Amazon, Cortana from Microsoft, and so forth. For portability, many of today's spoken dialogue systems are powered by battery. To preserve battery life, typically a Low Power Always Listening (LPAL) component with a magic word strategy is employed. The LPAL component runs a very low power automatic speech recognition (ASR) engine that recognizes and responds to only one “magic” word or phrase. On recognition of the “magic” word or phrase, the LPAL, activates a much more capable ASR component embedded in the spoken dialogue engine to recognize user utterances, and respond to the user utterances. The more capable ASR component during operation consumes more power than the LPAL component, thus is shut down after each response to a user utterance. As a result, the users in this type of system must use this magic word/phrase before every utterance they make to the system.
However, in cooperative natural conversation, participants give each other opportunities to interject or take over the turn, and participants do not need to call out each other's names every time they talk to make sure the other person is listening. Thus, today's spoken dialogue systems with LPAL and magic word strategy are unnatural, and annoying to many users. A solution to support more natural machine conversation interactions with a user, while preserving battery is needed.