Interactive Voice Response (IVR) applications use either DTMF or speech recognition. If DTMF, the application is invariably organized as a hierarchical collection of menus—each menu presenting a small collection of options from which the user may select. If using speech, the application might mimic DTMF menus or form-filling dialogues—an organizing architecture known as directed dialogue—or might adopt a newer and more sophisticated interface design paradigm known as natural language (NL).
One of the problems of ASR in supporting these dialogues is the difficulty of distinguishing between sentient user speech and distracting acoustical events—including intermittent noises, user mumbling, side conversation, user false starts, and similar occurrences. These events lead to instability in the dialogue, and error-recovery routines aimed at fixing the damage complicates the design and development of ASR applications.