An interactive voice response (IVR) system is a computer system that can be integrated with a telephone system that allows a caller to dial into the computer system over a telephone line and access a service running on the computer. The caller may then interact with and receive voice information from the service. Typically, the interactive service has a range of services for the caller to choose from and presents options at a prompt menu expecting the caller to select one. After the service option has been chosen, further information is required from the caller and input to the IVR. The service gathers relevant information, processes the information to get a result and from the result creates a prompt for delivery to the caller.
The interaction between the users and the system comprises various voice prompts output by the system and responses thereto input, via the telephone keypad, by the user. Voice response systems are used by service providers, such as banks, to automate fully or partially telephone call answering or responding to queries. Typically, a voice response system provides the capability to play voice prompts comprising recorded voice segments or speech synthesized from text and to receive responses thereto. The prompts are generally played as part of a voice menu invoked by the call flow logic. A state table can access and play a voice segment or synthesize speech from given text. The prompts are usually part of a voice application which is designed to, for example, allow a customer to query information associated with their various banks accounts.
As the users of such system may not be familiar with the use thereof, it is necessary to ensure that the instructions or voice prompts are sufficiently comprehensive to allow a novice user to successfully interact with the system. However, the more competent the IVR user, the more they begin to anticipate the various voice prompts and it becomes increasingly tedious for them to have to listen to such comprehensive instructions when more succinct instructions would suffice. “Expert” or fastpath methods are often provided, usually on explicit user selection. These allow the caller to enter multiple pieces of information at one time, and to hear shorter and more succinct prompts.
European patent publication 0697780 discloses a system for varying the voice menus and segments presented to the user of a voice response system according to the competence of the user. The response time of a user to voice prompts is measured, and an average response time is determined. It is assumed that the lower the average response time, the greater the competence of the user. The average response time is used as an index to a table of ranges of response times. Each range has respective voice segments associated therewith. The voice segments comprise oral instructions or queries for the user and vary according to the anticipated competence of the user. If the average response time changes such that the voice segments indexed are different to the current voice segments, then a database containing information relating to user competence is updated to reflect such a change. Accordingly, when the user next interacts with the voice response system, a new set of voice segments more appropriate to the user's competence will be played.
Using response times as a gauge of caller competence is only a first approximation and can be incorrect. The above publication concentrates on dual tone multi-frequency (DTMF) input to the IVR which is accurate but limited to a sometimes tedious closed menu structure and set sequences. A more flexible but less accurate approach to caller interaction uses speech converted into text as input to a service. For instance, instead of presenting the caller with an audible menu, the caller can be asked a more open question as to the nature of his business. An automatic speech recognition component (ASR) translates the speech into text, and the IVR interprets the text in the light of the services offered. Response times have only limited effectiveness as an estimate of the competence of the caller in such circumstances. Thus, there is a need in the art to provide an improved method of estimating the competence of a user for a speech recognition IVR system.
With the advent of advanced language processing techniques, such as Natural Language Understanding and Dialogue Management, the potential for both the “expert,” and less-experienced users to benefit from fastpaths and task switching are increasing. However, there are two crucial factors which cannot be catered for easily. First, it is necessary to introduce an explicit method such as menu selection, or even Caller Line Identification, to switch between expert and novice versions of a service. Second, this selection is made on a service-wide basis, and does not change either without redialling or returning to a point in the service where the selection may be made. Caller Line Identification (CLID) (or Automatic Number Identification (ANI) in the United States) can be used to retrieve caller records in which is stored the caller's preference of export or notice prompt. When given the choice, many callers will wrongly identify themselves as “expert” and discover that the service is not responding well because the service has changed or because of environmental factors. There is a common assumption that experts will automatically use or want to use barge-in, and the novice not. But again, for environmental reasons, each group is better served by flexibility.
The selection of expert or novice prompts, however, is not a generically applicable distinction, which the caller themselves can necessarily judge. In some circumstances (background or channel noise), the caller would be better served as a novice. Competence may change within the same application, which increases the problem of how to define “expert.” Expertise might be regarded as some level of competence in achieving a given task in the most efficient manner.