This invention relates to classifiers.
Speech and language processing technologies have the potential of automating a variety of customer care services in large industry sectors such as telecommunication, insurance, finance, etc. In an effort to reduce the cost structure of customer care services, many of these industries have depended heavily on complex interactive voice response (IVR) menus for either automating an entire transaction or for routing callers to an appropriate agent or department. Several studies have shown that the “unnatural” and poor user interface of these long touch-tone menus tend to confuse and frustrate callers, preventing them from accessing information and obtaining the desired service they expect. A recent study revealed that over 53% of surveyed consumers say that automated IVR systems are the most frustrating part of a customer service. In this survey, 46% of consumers dropped their credit card provider because of the perceived poor customer care.
Advances in speech and language technologies have the potential for improving customer care not only by cutting the huge cost of running call centers but also by providing a more natural communication mode for interacting with users, without requiring them to navigate through a laborious touch-tone menu. This has the effect of improving customer satisfaction and increasing customer retention rate. These values, which collectively form the foundation for an excellent customer care experience, have been evident in the AT&T Call Routing “How May I Help You” service, which is currently deployed nationally for consumer services, as reported by A. L. Gorin et al in “How May I help You,” Speech Communication, pp. 113–127, 1997.
It is expected that over the next few years, speech and language technologies will play a more vital role in not only customer care services but also in general “Help Desk” applications where the objective is not only routing of calls or accessing information but also in solving technical problems, answering sales inquiries, supplying requested recommendations, and trouble shooting. Many computing and telecommunication companies today provide some form of a Help Desk service through either the World Wide Web or using a human agent.
Several technology requirements exist for voice-enabling Help Desk applications, including having a speech recognizer that is capable of recognizing a large-vocabulary spontaneous speech and supporting barge-in, a spoken language understanding (SLU) unit that parses the natural language input into relevant information, a dialog manager that operates in a mixed-initiative mode, and a text-to-speech synthesizer that is able to generate high-quality synthesized voice statements to the user.
A large number of speech recognizers are known in the art including, for example, U.S. Pat. No. 6,246,986, issued to Ammicht et al on Jun. 12, 2001 and assigned to the assignee of this invention. The objective of the speech recognizer is to convert the speech utterances to text, to be employed in the SLU unit that follows.
As for the spoken-language-understanding (SLU) module, a need exists for an application-specific corpus of speech data that may be used for designing a classifier for that application, a set of classes for that application, and an annotation relative to the classes of the speech data in the corpus.
The speech data comprises a collection of entries, each entry being an utterance (also converted to text) of a word, a phrase, a sentence, or a number of sentences, where such utterances have been collected from users, or may be expected from the user of the designed application. The designers of the application determine the set of classes. The annotation of an entry is a set of one or more labels that attach to the entry, meaning that the entry is related to the attached labels. In a telecommunications application, for example, if the corpus of training utterances contains the entry “I wish to speak with someone regarding my July Statement” and if the label “billing” is included in the set of classes, then the “billing” label ought to be attached to this entry.
The process of creating annotations for the entries of the corpus of training data conventionally relies on information that comes with the corpus of speech data, or from people who are familiar with the application that is being designed. Collecting, transcribing and labeling speech data is a resource-intensive, and time consuming, process. This process does not form a part of this invention.
As indicated above, many situations exist in today's commercial environment where a natural language interaction with a customer would be very useful. The need exists to create a system that is able to naturally and effectively interact with customers; and especially, there are great commercial incentives for creating such systems quickly, dispensing with drawn out design processes that are carefully tailored to the applications. Additionally, there is a need to create such systems without the benefit of a large corpus of training utterances. Specifically, there is a need to create such systems without the benefit of a large corpus of speech data, which takes a long time to create and which is seldom available in the beginning of the development cycle.