Interactive Voice Response (IVR) is a software application that accepts a combination of voice telephone input and touch-tone keypad selection and provides appropriate responses in the form of voice, fax, callback, e-mail and perhaps other media. The quickening pace of adoption of speech solutions in the IVR industry is currently driven by improvements in speech algorithms, natural language processing, vocabulary management, and language modeling.
IVR and transaction-processing applications allow self-service access to automated banking, stock portfolios, account information, airline schedules, movie times, etc. Callers may also place orders, track order status, or use a directory to contact a department or individual. Automated speech recognition enhances the flexibility and power of IVR applications.
A speech recognition system typically includes an input device, a voice board that provides analog-to-digital conversion of a speech signal, and a signal processing module that takes the digitized samples and converts them into a series of patterns. These patterns are then compared to a set of stored models that have been constructed from the knowledge of acoustics, language, and dictionaries. The technology may be speaker dependent (trained), speaker adaptive (improves with use), or fully speaker independent. In addition, features such as “barge-in” capability, which allows the user to speak at anytime, and key word spotting, which makes it possible to pick out key words from among a sentence of extraneous words, enable the development of more advanced applications.
The main goal of speech recognition applications is to mimic human listeners. When a human listener hears a word sequence, he/she automatically attributes a confidence level to the utterance; for example, when the noise level is high, the probability of confusion is high and a human listener will probably ask for a repeat of the utterance. Accordingly, the confidence level is used to make further decisions on a recognized sequence. The “confidence level” obtained from the confidence measure is then used for various validations of the speech recognition results.
The functionalities that can be delivered by today's IVR speech solutions vary widely and range from recognition of spoken letters and numbers to more complex phrases and sentences. Some applications simply replace touch-tone interfaces with speech-enabled applications that recognize a very limited set of spoken letters and numbers that primarily represent the touch keypad. More advanced applications employ directed dialogue or system prompts that guide users to respond with fairly simple spoken words that can be accurately recognized. The most advanced natural language applications enable recognition of more complex phrases and sentences spoken in a conversational manner at a natural speed.
Speech solutions are now enabling the development of IVR applications that go beyond rigid touch-tone interface models to exploit the navigational flexibility offered by natural language processing. Natural language recognition and advanced user interfaces that conduct interactive dialogues with users in order to complete transactions are driving the creation of the most versatile and robust applications ever developed for the IVR industry.
The main factor driving the emergence of speech as the IVR user interface are increasing labor costs. The cost of employing live customer service agents is rising at the same time that organizations are facing increased pressure to reduce the cost of serving customers. When an automated call-processing solution is employed, a speech-enabled IVR application increases caller acceptance because it provides the friendliest and fastest self-service alternative to speaking with a customer service agent. Speech solutions also create new opportunities to automate transactions that are too cumbersome to complete using a DTMF interface, such as bill payment or stock trading. Higher call volumes make the addition of speech recognition more cost-effective. Speech solutions provide the potential for dramatic reductions in operational costs. Speech solutions improve the productivity of customer service personnel because a higher percentage of customer calls can be fully or partially automated. Increased automation frees the customer service agent from many routine administrative tasks and reduces costs related to customer service staffing, as fewer agents are able to serve more customers.
However, in order to maintain the cost savings provided by speech solution applications the caller must remain in the automated call processing transaction. When a caller opts-out of the automated system to talk to a live operator there is an associated charge to the company. The opt-out rate is the percentage of callers who opt to talk to a live agent.
One of the reasons a caller might choose to speak to a live agent is because of a rejection error by the IVR system. A rejection error occurs when a spoken word or phrase is not recognized or is recognized incorrectly by the system. A caller can select to opt-out verbally or by using a push button. The caller might also be timed out of the application and be automatically opted-out after no-speech is received. No-speech occurs when the user did not speak anything while the recognizer was waiting for speech. Some additional reasons a caller might opt-out of the automated system could include the instructions are not clear, the speech recognizer is not recognizing some unexpected spoken utterances, the application flow is cumbersome and the caller gets frustrated and opts out or the caller is unable to find the desired feature.
Conventional speech recognition applications only save the recognition audio. As such, it is difficult to determine where in the application a problem occurs. Currently, this determination can only be done, if at all, with some type of logging facility. However, it is difficult and cumbersome to review logging to determine where an error occurred. Also, if an application does not have the required logging there is no way to recreate the scenario where a problem occurred.
To optimize a speech application there exists a need for a diagnostic tool which enables an administrator to save an application execution of one or more sessions and compare opt-outs for a better understanding of how the application is operating.