The present invention relates to automatic voice interaction systems such as call centers and computer systems which include voice recognition means and voice response means and to methods of their use.
Highly precise and inexpensive voice recognition means such as voice recognition software which operates in computers such as xe2x80x9cViaVoicexe2x80x9d developed by IBM, Ltd. has recently become available, and an automatic voice response system using such voice recognition means has been suggested. An example of such an automatic voice interaction system includes a call center system which deals with customers who call the system to perform telephone shopping. Instead of a human operator, the shopping customer interacts with a computer through a voice recognition means and a voice response means to place an order for goods and/or services using voice communication.
In such telephone shopping systems in particular, large numbers of telephone calls to the call center are often made on the same day, and hence a large number of operators are required. By the use of the apparatus, computer program and method of the instant invention, the number of operators required for telephone shopping can be advantageously reduced. Generally, in such an automatic voice interaction system, a telephone shopping customer for example, herein after referred to as the caller, responds to an order question generated by a voice response means of a computer that took the call from the caller. The callers response to the system order question is provided to a voice recognition means and a recognition result is generated as text or an equivalent form of order information in the computer. This text is then synthesized into a voice form by the voice response means and sent to the caller in the form of an order confirmation question. Then, a caller is allowed to confirm the order to the computer through the voice recognition means.
The voice recognition steps in this method may fail owing to various kinds of causes such as the level of precision of the voice recognition means, tone quality and intonation of the caller, or a voice interpretation mistake by either the caller or the voice recognition means, perhaps due to noises generated around the customer. In this case, the caller is requested to utter the order information again. When in the example voice interaction system, the voice recognition fails repeatedly for a predetermined number of times, for example, three times, a private branch exchange switches the caller telephone connection so that the caller can speak with a human operator. Because the precision of presently employed voice recognition means is not 100% reliable, and because some callers can not interact effectively without the amenities of personal human conversation, sometimes called small talk, a human operator who can pick up on a telephone shopping call to an automatic voice interaction system is essential.
Current voice recognition means show a reasonable recognition accuracy in obtaining one item of order information from a caller when the machine has the opportunity to perform repeated recognition operations interleaved with voice response confirmation questions to the caller. When a larger number such as for example, four different order information items are required, the information capture accuracy is degraded. For example, when a caller must enter four items including a name of a bank to which money is transferred, a name of a bank""s branch, a sum of money and a transfer date, all by telephone voice communication to voice recognition means, the probability of completing the information input without switching from an automatic voice interaction system to a human operator is 0.9xc3x970.9xc3x970.9xc3x970.9=0.6561. In other words, nearly 35% of the callers drop out from the automatic voice interaction process before they reach the completion of the order processing, and a human operator must deal with the caller""s order instead of the automatic system. To be more specific, although there is some advantage to be obtained in adopting an interactive automatic voice response system, the automatic voice response system can not achieve total unmanned operation. In other words with the accuracy assumed above, about 100 lines can be supported by 34 operators. One reason that a better automated rate can not be achieved is that once an operator intervenes in a call being inadequately processed by an automated system, it is difficult for the operator to return the call to the automated system without creating caller dissatisfaction. Customer satisfaction seems to require that once an operator starts dealing with a caller, the call must continue to be handled on a manual basis and that requires a certain amount of time. Intervention by an operator who takes over a call when the caller is having difficulty with an automated system only seems to confirm in the callers mind that the automated system does not or can not work properly and further reduces the callers patience when requested to deal with repeated confirmation questions asked by an automated system using voice response means.
U.S. Pat. No. 6,044,142 issued to Hammerstrom et al attempts to solve this problem by switching a call to an operator for special assistance and then returning the call to their xe2x80x9cintelligent network servicesxe2x80x9d. This sounds good but the caller of Hammerstrom is aware of the operator intervention and will resist being shunted back into the computer system.
An advantage of the present invention is that it reduces the number of times that an operator must take over a call being inefficiently handled by an automated system while at the same time reducing the number of times that a caller is asked to repeat uttering information needed by the automatic system.
Another advantage of the present invention is that it allows human intervention into the automated information gathering process without alerting a caller that such intervention is occurring. Accordingly the caller is not in a position to expect further manual communication with a human operator and caller satisfaction is not degraded. Actually, caller satisfaction is more likely to be enhanced because an order for example may be expeditiously completed without the delay required for the call to be transferred to a human operator.
These and other advantages of the present invention are obtained by a novel automatic interactive voice communication system apparatus, computer program and method of operation that provides for automatic screening of voice recognition accuracy and for limited human intervention in the automatic interactive voice communication process without transferring the call. The accuracy of the automatic process is thereby improved without the need to abort the automatic process and transfer the call to a human operator.
An automatic voice system of a first aspect of the present invention for receiving a voice input from a caller and for transmitting a voice message to the caller includes:
automatic means for receiving a call and storing a callers voice input;
voice recognition means for analyzing the voice input and generating a voice recognition result;
screening means for recognizing when the voice recognition means has difficulty in recognizing intelligible information in the callers voice input, the screening means including screener interface means for reproducing the voice request when the voice recognition means has difficulty in recognizing an intelligible request in the callers voice request and for receiving a recognition result entered by a human voice inspector (hereinafter referred as a screener) of the callers voice input; and switching means for switching the call from the automatic means for receiving a call to an operator interface means by which an operator and the caller directly talk with each other.
In one embodiment such a system, the automatic voice response means transfers the received voice of the caller to the voice recognition means, and the voice recognition means generates the voice recognition result and transmits the voice recognition result to the automatic voice response means. The automatic voice response means transmits a confirmation message based on the voice recognition result to the caller. The confirmation message may be in the form of a question capable of being answered by a binary answer of yes or no. When a response by the caller to the confirmation message is negative, the automatic voice response means supplies the voice of the caller already received to the screener interface means, where it is heard by a screening person. The screener provides an input at the screener interface representative of the intelligible content of the callers voice. The screener""s input is used by the system in place of the previous voice recognition result and generates a new confirmation message question therefrom. The new confirmation message based on the input from the screener is sent to the caller. When a response by the caller to the new confirmation message is negative, the line exchanging means switches the voice connection of the caller from the automatic voice response means to the operator interface means. In the above described embodiment, the screening means recognizes that the voice recognition means is having difficulty in recognizing an intelligible request in the callers voice input by monitoring the callers response to the confirmation question.
Thus even if the voice recognition means should fail to recognize the an intelligible request in the voice input of the caller, the operator does not need to deal with the caller at this step in the method of the invention. Instead a screener who is specialized in separating background noise from voice utterance of a caller may recognize an intelligible request in the voice of the caller already received. Accordingly, processing by the automatic voice response system appears to the caller to be continuous, and reliability of the automatic voice response system appears high to the caller. Furthermore, since the screener hears merely a replay of the voice request of the caller and enters a recognition result into the screener interface, the screener is not required to enter into a conversation with the caller and need not have the special conversational skills that are needed to satisfy customers. Accordingly, the invention can be expected to reduce labor costs in total.
As a matter of course, the operator may serve also as the screener. The voice input of the caller is then replayed at the workstation of the operator and the operator enters discernible intelligible information detected in the callers voice into the screener interface which also may be the same means as used by the operator when entering information after a call has been transferred to the operator. So long as the number of times that a caller is asked to repeat a voice request is reduced, even though the number of cases in which the voice request of the caller is replayed to the screener increases, satisfaction of the caller in the automatic voice response system will be increased.
In another aspect of the invention, instead of transferring the voice of the caller to the screener as an error in response to a negative response from the caller, the voice of the caller may be transferred to the screener as an error in that the voice recognition means is able to itself determine that it fails to recognize intelligible information in a voice request of the caller.
In an embodiment of this aspect, voice recognition includes a word list of words to be recognized. This word list includes proposed words corresponding to expected voice input such as catalog order items. The voice recognition means accesses the word list while generating a recognition result. Being limited to a smaller vocabulary, improved recognition precision results.
The automatic interactive voice system according to a still further embodiment of the present invention further comprises a list of words to be recognized, which is referred to by the recognition means and also made available to the screener at the screener interface means. When the voice recognition means recognizes intelligible information in the voice of the caller for the predetermined question item, the voice recognition means refers to the word list and selects a word or a word phrase from proposed words corresponding to the question item, while generating a recognition result. In such way, the recognition precision is further increased. Likewise when the voice of a caller is replayed to a screener, the screener can enter a recognition result into the screener interface simply by referring to the list of words to be recognized and then selecting a word from some proposed words in the word list. Thus, the need for advanced training of the screener and uncertainty of the screener in a callers response can be significantly reduced.
A method is known from Japanese Patent Laid-Open No. 9(1997)-82688, in which when voice recognition attempts concerning the same voice input item is iterated, a word denied by the caller as the intended one based on an erroneous recognition result is excluded from the list of proposed words in voice recognition to be subsequently performed. This method can be applied to a system for dealing for a caller according to the present invention. In an automatic voice response system according to another embodiment of the present invention, when the voice recognition means repeats voice recognition steps relating to a predetermined question item, the voice recognition means excludes a word previously denied in a response from the caller from proposed words corresponding to a question item. When the caller responds negatively a predetermined number of times, the automatic voice response means supplies the voice item of the caller already received to the screener interface means as well as the history of words denied by the caller. In this method, if the screener recognizes any intelligible information in the voice of the caller, the screener can select a word from a reduced list of proposed words remaining after the denied words are excluded. Accordingly, recognition precision, or in other words accuracy, can be increased and recognition difficulty for the screener can be further reduced.
When a recognition result is selected from some proposed words during automated voice recognition, a method is known, in which recognition is made based on recognition probabilities relating to the proposed words. This method can also be applied to a system for dealing with a caller according to the present invention. Another aspect of the unmanned system for dealing a caller according to the present invention comprises automatic voice response means for receiving a voice input from a caller and for transmitting a message to the caller; voice recognition means for analyzing the voice and for generating a voice recognition result; screener interface means for reproducing the voice and for entering the voice recognition result by a screener thereof; operator interface means by which an operator and the caller directly talk with each other; line exchanging means for switching a conversation with the caller between the automatic voice response means and the operator interface means; and a word list to be recognized, which is referred to by the voice recognition means and the screener interface means. In this embodiment, the automatic voice response means transfers the voice input received from the caller to the voice recognition means, and the voice recognition means generates recognition results of a predetermined number in an order of higher recognition probability from proposed words relating to the received voice of the caller and transfers the recognition results to the automatic voice response means; when a recognition probability of a certain recognition result is higher than predetermined values of recognition probabilities of other recognition results, the automatic voice response means transmits a confirmation message to the caller based on the recognition result; when a response of the caller to the confirmation message is negative, the automatic voice response means supplies the voice of the caller already received to the screener interface means; when the recognition probabilities of the plurality of recognition results are higher than predetermined values of recognition probabilities of other recognition results, a confirmation message based on the recognition result of the highest recognition probability among the plurality of recognition results is transmitted to the caller; when a response of the caller to the confirmation message is negative, a confirmation message based on the recognition result of the second highest recognition probability is transmitted to the caller; when all of the responses of the caller to the confirmation messages based on the plurality of recognition results are negative, the automatic voice response means supplies the voice of the caller already received to the screener interface means, when all of the recognition probabilities of the recognition result is lower than predetermined values of recognition probabilities of other recognition results, the automatic voice response means supplies the voice of the caller already received to the screener interface means, the automatic voice response means receives the voice recognition result from the screener interface means, and the automatic voice response means transmits a confirmation message based on this voice recognition result to the caller; and when a response of the caller to this confirmation message is negative, the line switching means switches a conversation with the caller from the automatic voice response means to the operator interface means.
An automatic voice response system of still another embodiment of the present invention has a feature wherein the screener interface means commands the line switching means to switch the conversation with the caller from the automatic voice response means to the operator interface means. In such way, when the screener decides that an erroneous recognition is likely to occur permanently, because other voices are superimposed on the caller""s voice such as due to a TV turned on behind the caller or because of a poor voice quality of the caller, the screener can act to switch the conversation with the caller from the automatic voice response means to the operator interface means. On the other hand, when the screener determines that the erroneous recognition is temporary error recognition due to mixture of temporary background noise or the caller clearing his or her throat, the screener can allow another recognition attempt to proceed.
An automatic voice response method of the present invention comprises a receiving step for receiving a voice from a caller; an automatic recognition step for analyzing the voice and for generating a voice recognition result; a confirmation message transmitting step for transmitting a confirmation message based on the recognition result to the caller; a response receiving step for receiving a response relating to the confirmation message from the caller; a voice transfer step for returning to the receiving step for receiving a voice from a caller when the response is negative, and for supplying the voice of the caller to a screener when an affirmative response is not obtained by iterating the receiving step for a predetermined number of times; another confirmation message transmitting step for transmitting another confirmation message based on a recognition result by the screener; another response receiving step for receiving another response relating to another confirmation message from the caller; and a switching step for switching a conversation with the caller from the automatic voice response means to an operator of the screener interface means when another response is negative.
An automatic voice response method of an embodiment of the present invention comprises: a receiving step for receiving a voice from a caller; a recognition result generating step for generating recognition results of a predetermined number in an order of higher recognition probability from proposed words relating to the received voice of the caller; an automatic recognition step for transmitting a confirmation message based on this recognition result to the caller, for receiving a response of the caller relating the confirmation message, for supplying the voice of the caller to a screener interface section when the response of the caller is negative, for transmitting a confirmation message based on the recognition result of the highest recognition probability among the plurality of specified recognition results to the caller when recognition probabilities of a plurality of specified recognition results are higher than predetermined values of recognition probabilities of other recognition results other than the specified recognition results, for receiving a response relating to the confirmation message from the caller, for transmitting a confirmation message based on the recognition result having the second highest recognition probability when the response is negative, for receiving a response from the caller relating to the confirmation message, and for supplying the voice of the caller to the screener interface means when all of the responses of the caller to the confirmation messages based on the plurality of recognition results are negative, for supplying the voice of the caller to the screener interface means when all of the recognition probabilities of the recognition results are lower than predetermined values of the other recognition probabilities; and a screener recognition step for transmitting a confirmation message based on a recognition result by the screener, for receiving a response of the caller relating to the confirmation message, and for switching a talking of the caller to a talking in which the caller speaks directly to an operator when the response is negative.