Many companies interact with their customers via electronic means (most commonly via telephone, e-mail, SMS, Social Media (such as Twitter), and online chat). Such electronic systems save the companies a large amount of money by limiting the number of customer service or support agents needed. These electronic systems, however, generally provide a less than satisfactory customer experience. The customer experience may be acceptable for simple transactions, but are frequently inconsistent or downright frustrating if the customer is not adept at talking to or interacting with a computer.
Such interactive response systems are well known in the art. For example, providing customer service via telephone using an interactive voice response (IVR) system is one such system. An example of customer service systems utilizing IVR technology is described in U.S. Pat. No. 6,411,686. An IVR system typically communicates with customers using a set of prerecorded phrases, responds to some spoken input and touch-tone signals, and can route or transfer calls. A drawback to such IVR systems is that they are normally built around a “menu” structure, which presents callers with just a few valid options at a time and require a narrow range of responses from callers.
Many of these IVR systems now incorporate speech recognition technology. An example of a system incorporating speech recognition technology is described in U.S. Pat. No. 6,499,013. The robustness of the speech recognition technology used by IVR systems vary, but often have a predetermined range of responses that they listen for and can understand, which limits the ability of the end user to interact with the system in everyday language. Therefore, the caller will often feel that they are being forced to speak to the system “as though they are talking to a computer.” Moreover, even when interacting with a system that utilizes speech recognition, customer input is often either not recognized or incorrectly determined, causing the customer to seek a connection to a human customer service agent as soon as possible.
Human customer service agents continue to be used for more involved customer service requests. These agents may speak to the customer over the phone, respond to customer e-mails, SMS, Tweets, and chat with customers online. Agents normally answer customer questions or respond to customer requests. Companies have customer service groups, which are sometimes outsourced to businesses that specialize in “customer relations management.” Such businesses run centers staffed by hundreds of agents who spend their entire working day on the phone or otherwise interacting with customers. An example of such system is described in U.S. Pat. No. 5,987,116.
The typical model of customer service interaction is for one agent to assist a customer for the duration of the customer's interaction. At times, one agent (for example, a technical support representative) may transfer the customer to another agent (such as a sales representative) if the customer needs help with multiple requests. But in general, one agent spends his or her time assisting that one customer for the full duration of the customer's interaction (call, text, or chat session), or is occupied resolving the customer's issue via e-mail. Most call centers also expect the agent to take the time to log (document) the call. Deficiencies in this heavy agent interface model is (1) there is a high agent turnover rate and (2) a great deal of initial and ongoing agent training is usually required, which all add up to making customer service a significant expense for these customer service providers.
In order to alleviate some of the expenses associated with agents, some organizations outsource their customer service needs. One trend in the United States in recent years, as high-speed fiber optic voice and data networks have proliferated, is to locate customer service centers overseas to take advantage of lower labor costs. Such outsourcing requires that the overseas customer service agents be fluent in English. In cases where these agents are used for telephone-based support, the agent's ability to understand and speak clearly in English is often an issue. An unfortunate result of off shore outsourcing is misunderstanding and a less than satisfactory customer service experience for the person seeking service.
Improved interactive response systems blend computer-implemented speech recognition with intermittent use of human agents. For example, U.S. Pat. No. 7,606,718 discloses a system in which a human agent is presented with only portions of a call requiring human interpretation of a user's utterance. The contents of U.S. Pat. No. 7,606,718 as well as all other art referred to herein is hereby incorporated by reference as is fully set forth herein. Interest in such systems is enhanced if they are relatively low in cost, which generally calls for limited human interaction. To achieve such limited human interaction, it would be desirable to have a system that required minimal initial training and for which results continued to improve over time. In particular, a learning/training system that provides “day-one” performance that is suitable for production use quickly and that improves in efficiency over time would be particularly valuable.
Many existing automated speech recognition (ASR) systems suffer from serious training constraints such as the need to be trained to recognize the voice of each particular user of the system or the need to severely limit recognized vocabulary in order to provide reasonable results. Such systems are readily recognizable by users as being artificial. Consider the difference between the typical human prompt, “How can I help you?” and the artificial prompt, “Say MAKE if you want to make a reservation, STATUS if you would like to check on status of a reservation, or CANCEL to cancel a reservation.”
Systems that are more ambitious, such as Natural Language Understanding (NLU) systems, require extensive labor intensive and complex handcrafting and/or machine learning periods in order to get usable results from larger grammars and vocabularies. Particularly in environments in which vocabulary may be dynamic (such as a system to take ticket orders for a new play or for a concert by a new musical group), the learning period may be far too long to provide satisfactory results. Inclusion of accents, dialects, regional differences and the like in grammar further complicate the task of teaching such systems so that they can achieve reasonable thresholds of recognition accuracy.
Therefore, there remains a need in the art for an interactive system that provides a consistently high-quality experience without the expense of a large staff of dedicated, highly trained agents or long and complicated training of constituent ASR, as well as Machine Vision and/or Natural Language Processing components.