Just like human personal assistants, digital assistants or virtual assistants can perform requested tasks and provide requested advice, information, or services. An assistant's ability to fulfill a user's request is dependent on the assistant's correct comprehension of the request or instructions. Recent advances in natural language processing have enabled users to interact with digital assistants using natural language, in spoken or textual forms, rather than employing a conventional user interface (e.g., menus or programmed commands). Such digital assistants can interpret the user's input to infer the user's intent, translate the inferred intent into actionable tasks and parameters, execute operations or deploy services to perform the tasks, and produce outputs that are intelligible to the user. Ideally, the outputs produced by a digital assistant should fulfill the user's intent expressed during the natural language interaction between the user and the digital assistant.
The ability of a digital assistant system to produce satisfactory responses to user requests depends on the natural language processing, knowledge base, and artificial intelligence implemented by the system. A well-designed response procedure can improve a user's experience in interacting with the system and promote the user's confidence in the system's services and capabilities.
Many digital assistants can deliver responses in the form of speech outputs. For example, in some circumstance, speech outputs include one or more turn-by-turn directions (e.g., “Turn left on Whipple Avenue”) read aloud to the user by a text-to-speech engine. These speech outputs are generally provided at a predetermined time (e.g., ¼ of a mile before reaching Whipple Avenue) or immediately (e.g., in the case of a response to a question such as “What time is it?”). A disadvantage of these digital assistant response procedures is speech outputs may be provided at inopportune times, such as by interrupting a user who is speaking into the device during a phone conversation or issuing new requests to the digital assistant. Interruptions that provide non-urgent information are frustrating and inconvenient for users. In addition, while possible to listen to two different audio streams, it is difficult for people to listen while they are, themselves, talking. Therefore, when a digital assistant attempts to deliver speech outputs while a user is speaking, it inhibits the user's ability to understand those speech outputs.
Accordingly, there is a need for methods of operating a digital assistant that intelligently and intuitively determine whether to provide a speech output. In particular, there is a need for methods of operating a digital assistant that determine whether the user is speaking and whether the speech output is urgent enough to warrant an interruption.