1. Field of the Invention
The present invention relates to spoken dialog systems and more specifically to a system and method of using a dialog trajectory analysis to provide feedback to automatically detect problems and implement improvements.
2. Introduction
The present invention relates generally to spoken dialog systems. FIG. 1 illustrates a block diagram of the basic components of a spoken dialog system. Natural language spoken dialog system 100 may include an automatic speech recognition (ASR) module 102, a spoken language understanding (SLU) module 104, a dialog management (DM) module 106, a spoken language generation (SLG) module 108, and a text-to-speech (TTS) module 110. ASR module 102 may analyze speech input and may provide a transcription of the speech input as output. SLU module 104 may receive the transcribed input and may use a natural language understanding model to analyze the group of words that are included in the transcribed input to derive a meaning from the input. DM module 106 may receive the meaning of the speech input as input and may determine an action, such as, for example, providing a spoken response, based on the input. SLG module 108 may generate a transcription of one or more words in response to the action provided by DM 106. TTS module 110 may receive the transcription as input and may provide generated audible speech as output based on the transcribed speech.
Thus, the modules of system 100 may recognize speech input, such as speech utterances, may transcribe the speech input, may identify (or understand) the meaning of the transcribed speech, may determine an appropriate response to the speech input, may generate text of the appropriate response and from that text, generate audible “speech” from system 100, which the user then hears. In this manner, the user can carry on a natural language dialog with system 100. Those of ordinary skill in the art will understand the programming languages and means for generating and training ASR module 102 or any of the other modules in the spoken dialog system. Further, the modules of system 100 may operate independent of a full dialog system. For example, a computing device such as a smartphone (or any processing device having a phone capability) may have an ASR module wherein a user may say “call mom” and the smartphone may act on the instruction without a “spoken dialog.” Therefore, a spoken dialog “system” may include one or more of the modules to carry out the necessary functions for the particular application that uses speech technologies.
Dialog systems exist in a variety of instantiations, each allowing the user a different interaction medium. Touch-tone dialog systems exist that accept only keypad input, requiring a user to select from a predefined set of options that may or may not reflect the user's problem. Directed dialog systems also exist that allow speech input but greatly constrain what the user can say. Quite often these systems do not differ greatly from traditional touch-tone systems. The most flexible of dialog systems enable user initiative, allowing the user to describe their problem in unconstrained fluent speech, shifting the burden from the user to the system. See, e.g., A. L. Gorin, G. Riccardi and J. H. Wright, “How May I Help You?”, Speech Communication, 23: 113-127, 1997; and A. L. Gorin, A. Abella, T. Alonso, G. Riccardi and J. H. Wright, “Automated natural spoken dialog”, IEEE Computer Magazine, 35(4):51-56, 2002, both incorporated herein by reference. A method for monitoring the user-system interaction is required regardless of the type of dialog system.
There are two traditional ways to monitor deployed spoken dialog systems. First, call monitoring enables operations personnel to dial in and listen to a series of calls made to the system. This has the advantage that the listeners can hear the customers' speech and assess the experience from the customers' viewpoint. However, the sample of calls is bound to be very small, it may be unrepresentative because of the timing of dial-in sessions, and the listeners' assessments will be subjective. Second, summary reports are normally available on a daily or weekly basis. Typically these give a breakdown of the overall outcomes of all the calls into the system, including the numbers of service completions, transfers to agents, and hang-ups. But they are usually too coarse-grained to be of diagnostic value when some parts of the system are not performing well. The same is true of the usability measures widely used for spoken dialog systems.
Therefore, the small-sample and subjective call-monitoring and the summary reports that are too coarsely grained to be of diagnostic value provide little real data on the efficiency and productivity of the dialog system.
What is needed in the art is an improved method of obtaining feedback on how well a spoken dialog system is operating and using that feedback to improve the performance of the system.