Increasing popularity of portable electronics demands that electronic devices become capable of handling more functions. One of the areas of development is human-machine interaction based on voice or motion. When a user provides a request to a machine by providing a voice command instead of touching or typing on a visual display, a user's interaction with a machine becomes more similar to human-to-human interaction, therefore being more natural and intuitive.
One of the challenges in implementing the human-machine voice communication is knowing when the machine should be waiting for a user command. As it is seldom the case that a user is constantly and continuously talking to his machine, it is not efficient for the machine to be constantly listening for commands. However, it is equally important that the machine not miss a communication from a user when it comes. Existing voice interaction engines such as AMAZON ECHO® and GOOGLE NOW™ address this problem by requiring a trigger word from the user as a signal to the machine to receive a user command of the trigger word. This trigger-word mechanism prevents false triggering and saves processing power. However, it has the disadvantage of feeling unnatural to the user, who has to say the trigger word every time he wants to interact with his machine.
Apple's Siri voice engine does not require a trigger word but instead relies on a button touch to start waiting for a user command. While some users may prefer this touch-based initiation to trigger words, neither option is ideal as they both require the user to do something that he would not do when interacting with another human. A more natural way of initiating machine interaction without wasting processing power or compromising accuracy is desired.