Electronic devices with voice interfaces have been widely used to collect voice inputs from users and perform different voice-activated functions according to the voice inputs. These voice-activated functions may include directing or commanding a target device to perform an operation. For example, the user may utter a voice input to a voice interface device to direct a target device to turn on or off, or to control media playback at the target device.
Typically, if a user wishes to make a voice input that directs a target device to perform an operation, the user would specify the target device in the voice input. However, having to explicitly specify the target device for all such voice inputs is tedious and burdensome to the user. It is desirable for a voice interface device to have a target device for a voice input even when the voice input does not specify a target or specifies an ambiguous target.
Further, it is useful for a voice interface device to be able to inform the user of important updates. These “proactive notifications” can be things like a taxi car arriving, a food delivery arriving, a home security alert, or even that a sports team won or lost a game. However, the timeliness of the delivery of these proactive notifications can be impacted by other interactions the user has with the device. A user might be in the middle of a long interaction (e.g., a conversation, playing a game, or making a reservation) with the device, and providing the notification while the long interaction with the device is ongoing may break the flow of the interaction, which may be disruptive to the user and may even make the use start over. On the other hand, if the notification delayed until the interaction is over, the notification may be untimely.