The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for customizing responses to users in automated dialogue systems.
A dialogue system, or conversational agent (CA) is a computer system intended to converse with a human. Dialog systems have employed text, speech, graphics, haptics, gestures and other modes for communication on both the input and output channel. There are many different architectures for dialog systems. What sets of components are included in a dialog system, and how those components divide up responsibilities differs from system to system.
Principal to any dialog system is the dialog manager, which is a component that manages the state of the dialog, and dialog strategy. A typical activity cycle in a dialog system contains the following phases. Initially, the user speaks, and the input is converted to plain text by the system's input recognizer/decoder, which may include automatic speech recognizer (ASR), gesture recognizer, or handwriting recognizer, or the like. The generated text is analyzed by a natural language processing (NLP) system, which may include logic for performing proper name identification, part of speech tagging, syntactic/semantic parsing, and the like.
The semantic information is analyzed by the dialog manager, which keeps the history and state of the dialog and manages the general flow of the conversation. Usually, the dialog manager contacts one or more task managers which have knowledge of the specific task domain to perform various tasks on the natural language text based on the NLP system operations, to perform domain specific actions. The dialog manager produces output using an output generator. The output is rendered using an output renderer, which may include performing text-to-speech translation, rendering a graphical representation, outputting a textual response, or the like.
In speech or text based dialogue systems, such as automated customer service systems, users communicate with the system through spoken utterances or short text messages, provided in a natural language. Once a user input (spoken utterance or text input) is received, the automated system attempts to process/analyze the user utterance to reduce it to a computer understandable form. Given this unambiguous interpretation of the utterance, the system can perform tasks or produce a response, such as an answer to a question asked by the user. However, some user utterances, text inputs, or portions thereof, may be ambiguous to the dialogue system. For example, the term “it” in spoken or text input may be ambiguous as to what “it” is referring to. As a result of this reference ambiguity, the dialogue system may ignore, or skip the ambiguous portions of the user input. This may reduce the many possible interpretations that the dialogue system considers, which in turn leads to a potentially inaccurate or non-optimized response. This may lead to frustration on the part of the user in that the user may feel that they are receiving inaccurate responses and are not being listened to correctly.