1. Field of the Invention
The present invention relates generally to spoken language understanding systems and more particularly to improving spoken language understanding systems using word confusion networks.
2. Introduction
Voice-based natural dialog systems enable customers to express what they want in spoken natural language. Such systems automatically extract the meaning from speech input and act upon what people actually say, in contrast to what one would like them to say, thereby shifting the burden from users to the machine.
In a natural language spoken dialog system, it is very important for the system to be robust to automatic speech recognition (ASR) errors, since all the communication is in natural spoken language. Especially with telephone speech, the typical word error rate (WER) is around 30%. To illustrate the potential impact of ASR errors, consider the example of a 1-best ASR output where the ASR system outputs a single best hypothesis. In this example, the 1-best ASR output could include the single best hypothesis, “I have a question about my will.” This hypothesis erroneously selected the word “will” instead of “bill”. Obviously, a failure to recognize such a salient word would result in misunderstanding the whole utterance, notwithstanding the fact that all the other words have been correctly interpreted. Moreover, it is important that the spoken language understanding system tolerate some amount of orthographic variability, since people say the same thing in different ways.