Virtually all major corporations, financial institutions, technical support centers, hospitals, airlines, call centers, and government agencies route telephone inquiries to appropriate human agents through automated interactive voice response systems. Although widely used, the problems of dealing with interactive voice response systems can be felt intuitively first hand and are also widely recognized in the research literature of human-computer interaction. The populace often calls these problems the “touchtone hell”.
In research, it is well known that the current generation of telephone interfaces is frustrating to use, in part because callers have to wait through the recitation of long prompts in order to find the options that interest them. Researchers have long studied how to better design the voice menu to ease the frustrations experienced by callers and have proposed various voice menu design guidelines. For example, some have suggested that long touchtone menus route the caller more efficiently than short menus, since long menus reduce the number of menu layers to navigate.
In contrast, another suggested method of easing the limitations of auditory menus is to employ shorter menus with greater depth in the hierarchy. Inspired by humans' ability to shift their gaze in order to skip uninteresting items and scan through large pieces of text, alternative touchtone interface styles in which callers issue explicit commands to accomplish skipping actions have also been proposed.
Despite the efforts of researchers, the same voice-menu based interactive voice response systems remain the state of the art. The difficulty of navigating voice menus is fundamentally rooted in the nature of auditory information. Visual and auditory stimuli have fundamentally different characteristics. Unlike graphical and textual menus, voice menus are sequential at a fixed pace, either too fast (when the information is critical) or too slow (when the information is uninteresting) to the caller. A long voice menu is frustrating to the caller since it requires the caller to memorize many choices to compare and select the most reasonable one.
On the other hand, short voice menus comprising broad categories can also be difficult because the caller is often unsure which category leads to the desired end. It is often difficult for a caller to determine if a particular category of functions suits their need until items at a lower level of the hierarchical menu are heard. If the caller is impatient and fails to catch, or forgets, a particular choice, he or she often has to start all over.
In comparison to process visual menus, voice menus impose greater cognitive burden on the user. To navigate interactive voice response systems, the caller has to understand the words and maintain them in memory leaving less processing ability of the brain to search for the best choice matching the goal the user has in mind.
In contrast, if a graphical menu is available, a caller can visually scan and select from a menu visually displayed with text at a pace set by the caller. A caller can scan and compare items on the menu without having to commit the items to memory. With a visual menu, a caller can also more easily jump between different levels of a visual hierarchical menu structure.
Clearly, it is potentially advantageous for interactive voice response systems to visually display a voice menu on a screen to the caller. One conventional approach to visual display of a voice menu displays the text of the voice menu onto a screen built into a phone set. This approach requires specially designed phone set and it significantly complicates the communication mechanisms and protocols between the interactive voice response system and the telephone handset. It requires advanced telephone sets instrumented with large screens and CPUs to handle the visualized display of voice menus.
Another approach is to use a computer to display the visual content of a voice menu. The difficulty with this approach has been the requirement of a direct physical connection between the phone set, the local computer as well as their coordination with the telephone network and the interactive voice response (IVR) systems. Reference is made for example to U.S. Pat. No. 6,091,805.
Conventional methods further require changing or enhancing the protocols and functions of the widely deployed phone switching circuits and interactive voice response systems so that both voice and text data can be simultaneously transmitted to the same telephone set. The cost of upgrades on both the phone sets and the infrastructure presents a difficult challenge to these conventional methods.
What is therefore needed is a system, a computer program product, and an associated method for seamlessly integrating an interactive visual menu with an audio menu provided in an interactive voice response system without imposing complicated hardware, circuits, or communication mechanism changes. The need for such a solution has heretofore remained unsatisfied.