Just like human personal assistants, digital assistants or virtual assistants can perform requested tasks and provide requested advice, information, or services. An assistant's ability to fulfill a user's request is dependent on the assistant's correct comprehension of the request or instructions. Recent advances in natural language processing have enabled users to interact with digital assistants using natural language, in spoken or textual forms, rather than employing a conventional user interface (e.g., menus or programmed commands). Such digital assistants can interpret the user's input to infer the user's intent, translate the inferred intent into actionable tasks and parameters, execute operations or deploy services to perform the tasks, and produce outputs that are intelligible to the user. Ideally, the outputs produced and the tasks performed by a digital assistant should fulfill the user's intent expressed during the natural language interaction between the user and the digital assistant. However, digital assistants will, from time to time, produce erroneous outputs and/or perform erroneous tasks in response to a user input, which can be irritating for users, and can make the digital assistant appear incompetent or unsophisticated.
Also, digital assistants that interact with users via speech inputs and outputs typically employ speech-to-text processing techniques to convert speech inputs to textual forms that can be further processed, and speech synthesis techniques to convert textual outputs to speech. In both cases, accurate conversion between speech and text is important to the usefulness of the digital assistant. For example, if the words in a speech input are incorrectly identified by a speech-to-text process, the digital assistant may not be able to properly infer the user's intent, or may provide incorrect or unhelpful responses. Similarly, if the words in a speech output are incorrectly pronounced by the digital assistant, the user may have difficulty understanding the digital assistant. Incorrect pronunciations by the digital assistant also make the assistant appear incompetent or unsophisticated, and may reduce users' interest and confidence in the digital assistant.
In order to improve the quality of digital assistants, it is helpful to identify particular instances where errors have occurred, so that the source of the errors can be identified and addressed. However, it is difficult to identify errors made by a digital assistant, because there is often limited or no feedback about whether an error has occurred. Moreover, even if the occurrence of errors can be detected, it can be difficult to determine exactly what the error was or what part of an interaction or task performed by the digital assistant was perceived by the user to be in error.
Accordingly, there is a need for systems and methods to determine when errors occur in interactions with a digital assistant.