Recent advancements in the ability of computing systems to recognize and understand human speech has led to the increased use and availability of computer-based personal assistants or other speech-interactive computing systems. In particular, certain “smart” appliances are beginning to incorporate advanced features able to directly respond to user voice requests. For example, an appliance (e.g. a refrigerator) can perform a requested action or operation in response to a voice request.
However, from the user's perspective, the increasing presence of such feature-rich appliances can undesirably increase the complexity of interacting with an appliance. Thus, a challenge presented by such recent advancements is to provide advanced appliance features without burdening the user with onerous interaction with the technology.
As an example, speech technology used in connection with a phone or in a car is typically initiated in a push to talk method, in which a user presses a button or provides another physical indication to the device that the user is about to give a speech command. However, in the context of appliances, push to talk is problematic, as the user may have their hands occupied with kitchen tasks such as stirring or chopping. In addition, a user handling food such as raw meats may find it undesirable to have hand contact with the appliance.
As another example, instead of using push to talk, some speech recognition systems employ a wake up word, in which the user utters a particular word or phrase to indicate that the user is preparing to provide a speech command. However, speech recognition systems employing a wake up word are vulnerable to false positives, in which background noise such as an ambient conversation causes the system to incorrectly wake up and attempt to respond.
The above noted problems with wake up words are particularly problematic in the context of the home kitchen, which has long been a center of activity of a household. In particular, modern kitchens can be subject to many background noises, such as a television, music player, conversation, appliance mechanical operation noises, or other background noise. Thus, the use of wake up words in the kitchen context can be undesirable.
Furthermore, background noise in the kitchen or other home environment is a problem that continues even after recognition that the user is providing a voice command. In particular, background noise can interfere with reception and processing of the voice command. For example, if the user voice command is not clearly audible over the background noise, then the speech recognition system can have difficulty processing the received audio signal.
Therefore, systems and methods for improving speech command identification and analysis are desirable. In particular, systems and methods that use visual cues to improve speech command recognition are desirable.