Call centers are used by many industries to provide information by voice communication to a large number of customers or other interested parties. Telemarketing companies, for example, use call centers to process both inbound and outbound calls, mostly concerning offers of goods and services, but also to provide other information for company clients. Banks and financial institutions also use call centers, as do manufacturing companies, travel companies (e.g., airlines, auto rental companies, etc.), and virtually any other business having the need to contact a large number of customers, or to provide a contact point for those customers.
Telemarketing is a well-known form of remote commerce, that is, commerce wherein the person making the sale or taking the sales data is not in the actual physical presence of the potential purchaser or customer. In general operation, a prospective purchaser typically calls a toll-free telephone number, such as an 800 number. The number dialed is determined by the carrier as being associated with the telemarketer, and the call is delivered to the telemarketer's call center. A typical call center will have a front end with one or more voice response units (VRU), call switching equipment, an automatic call distributor (ACD), and several work stations having a telephone and computer terminal at which a live operator processes the call. The dialed number, typically taken automatically from the carrier (long distance) through use of the dialed number identification service (DNIS), is utilized to effect a database access resulting in a “screen pop” of a script on the operator's computer terminal, utilizing a computer telephone integration (CTI) network. In this way, when a prospective purchaser calls a given telephone number, a telemarketing operator may immediately respond with a script keyed to the goods or services offered. The response may be at various levels of specificity, ranging from a proffer of a single product, e.g., a particular audio recording, or may be for various categories of goods or services, e.g., where the dialed number is responded to on behalf of an entire supplier. Typically, the prospective purchaser is responding to an advertisement or other solicitation, such as a mail order catalog or the like, from which the telephone number is obtained.
In a typical telemarketing or customer service campaign, scripts are prepared for use by the call center agents handling incoming and/or outgoing telephone calls. Script preparation is a highly developed skill, and scripts are usually constructed to obtain optimum results and tested to confirm that such optimization is achieved. It is, therefore, potentially extremely damaging to a telemarketing campaign when the scripts are not followed by the call center agents, either in whole or in part. As a result, call center management typically includes one or more methods for overseeing script compliance, such as providing call center managers having the responsibility for ensuring such compliance by random sampling of calls or investigating under-performance by specific agents, for example. Commercial recording and monitoring products are available, such as NiceLog® produced by NICE Systems Ltd. (Tel Aviv, Israel) or recording and analysis products produced by Witness Systems, Inc. (Roswell, Ga.). These products operate by recording call center voice interactions and capturing the agent's computer desktop activities, which are then available for review, either in real-time or in recorded form. These systems and methods are very labor intensive, inefficient, and non-comprehensive, and a need therefore exists for improved methods and apparatus for verifying script compliance in these situations.
The use of telephonic systems to effect commercial transactions is now well known. For example, in Katz U.S. Pat. No. 4,792,968, filed Feb. 24, 1987, and issued Dec. 20, 1988, entitled “Statistical Analysis System for Use With Public Communication Facility”, an interactive telephone system for merchandising is disclosed. In one aspect of the disclosure, a caller may interact with an interactive voice response (IVR or VRU) system to effectuate a commercial transaction. For example, the caller may be prompted to identify themselves, such as through entry of a customer number as it may appear on a mail order catalog. In an interactive manner, the caller may be prompted to enter an item number for purchase, utilizing an item number designation from the catalog or otherwise interact with the system to identify the good or service desired. Provision is made for user entry of payment information, such as the entry of a credit card number and type identifier, e.g., VISA, American Express, etc. Options are provided for voice recording of certain information, such as name, address, etc., which is recorded for later processing, or in certain modes of operation, connecting the customer to a live operator for assistance. More recent applications for electronic commerce are described in Katz PCT Publication No. WO94/21084, entitled “Interactive System for Telephone and Video Communication Including Capabilities for Remote Monitoring”, published Sep. 15, 1994. In certain aspects, the application provides systems and methods for conduct of electronic commerce over communication networks, such as through the accessing of such resources via an on-line computer service, wherein the commercial transaction may be effected including some or all of dynamic video, audio and text data. Optionally, the system contemplates the interchange of electronic commerce commercial data, e.g., electronic data interchange (EDI) data, where on-line computer services are used by at least certain of the potential purchasers to interface the system, such as is used to access the Internet.
Automatic speech recognition (ASR) is a technology well known in the art, and several examples of applications of ASR technology are described in a number of United States patents. For example, in Boggs U.S. Pat. No. 4,860,360, filed Apr. 6, 1987, and issued Aug. 22, 1989, entitled “Method of Evaluating Speech,” a speech quality evaluation process is described. The process incorporates models of human auditory processing and subjective judgement derived from psychoacoustic research literature, rather than the prior art use of statistical models that did not reflect the underlying processes of the auditory system.
Watanabe U.S. Pat. No. 5,287,429, filed Nov. 29, 1991, and issued Feb. 15, 1994, entitled “High Speed Recognition of a String of Words Connected According to a Regular Grammar by DP Matching,” describes a speech recognition method using an input string of words represented by an input sequence of input pattern feature vectors. The input string is selected from a word set of first through n-th words and substantially continuously uttered in compliance with a regular grammar.
In Jeong U.S. Pat. No. 5,434,949, filed Aug. 13, 1993, and issued Jul. 18, 1995, entitled “Score Evaluation Display Device for an Electronic Song Accompaniment Apparatus,” the described device has an audio signal processing unit to evaluate a user's singing. A sampling processor samples the difference between an input song signal from a microphone and reference song signal to generate an evaluation score.
In Lee U.S. Pat. No. 5,504,805, filed Apr. 5, 1993, and issued Apr. 2, 1996, entitled “Calling Number Identification Using Speech Recognition,” a caller's telephone number is extracted from a recorded message using voice recognition. The called party initiates automatic dialing of the calling party's number after confirming that the number was correctly recognized by the system.
McDonough et al. U.S. Pat. No. 5,625,748, filed Apr. 18, 1994, and issued Apr. 29, 1997, entitled “Topic Discriminator Using Posterior Probability or Confidence Scores,” describes an improved topic discriminator including an integrated speech recognizer or word and phrase spotter as part of a speech event detector, and a topic classifier trained on topic-dependent event frequencies. The phrase spotter is used to detect the presence of phrases without the need of parsing the output of a speech recognizer's hypothesized transcription.
In Rtischev et al. U.S. Pat. No. 5,634,086, filed Sep. 18, 1995, and issued May 27, 1997, entitled “Method and Apparatus for Voice-Interactive Language Instruction,” a spoken-language apparatus is described having context-based speech recognition for instruction and evaluation, particularly language instruction and language fluency evaluation. The system administers a lesson, and particularly a language lesson, and evaluates performance in a natural interactive manner while tolerating strong foreign accents, and produces as an output a reading quality score.
Lyberg U.S. Pat. No. 5,664,050, filed Mar. 21, 1996, and issued on Sep. 2, 1997, entitled “Process for Evaluating Speech Quality in Speech Synthesis,” describes a process for using a speech recognition system programmed using a number of persons. The system receives synthetic or natural speech and displays the differing speech quality.
Kallman et al. U.S. Pat. No. 5,742,929, filed May 28, 1996, and issued Apr. 21, 1998, entitled “Arrangement for Comparing Subjective Dialogue Quality in Mobile Telephone Systems,” describes a system including a transmitter for transmitting a signal representing a correct dialogue quality and a speech recognition device for receiving and evaluating the received signal.
Weintraub U.S. Pat. No. 5,842,163, filed Jun. 7, 1996, and issued Nov. 24, 1998, entitled “Method and Apparatus for Computing Likelihood and Hypothesizing Keyword Appearance in Speech,” describes a method using a scoring technique wherein a confidence score is computed as a probability of observing the keyword in a sequence of words given the observations. The method involves hypothesizing a keyword whenever it appears in any of the “N-best” word lists with a confidence score that is computed by summing the likelihoods for all hypotheses that contain the keyword.
In Ittycheriah et al. U.S. Pat. No. 5,895,447, filed Jan. 28, 1997, and issued Apr. 20, 1999, entitled “Speech Recognition Using Thresholded Speaker Class Model Selection or Model Adaptation”, a speaker recognition system is provided including an arrangement for clustering information values representing respective frames of utterances of a plurality of speakers by speaker class in accordance with a threshold value to provide speaker class specific clusters of information, an arrangement for comparing information representing frames of an utterance of a speaker with respective clusters of speaker class specific clusters of information to identify a speaker class, and an arrangement for processing speech information with a speaker class dependent model selected in accordance with an identified speaker class.
Mostow et al. U.S. Pat. No. 5,920,838, filed Jun. 2, 1997, and issued Jul. 6, 1999, entitled “Reading and Pronunciation Tutor,” describes a computer implemented reading tutor. A player outputs a response, and an input block implements a plurality of functions such as silence detection, speech recognition, etc. The tutor compares the output of the speech recognizer to the text which was supposed to have been read and generates a response, as needed, based on information in a knowledge base and an optional student model. The response is output to the user through the player.
Ramalingam U.S. Pat. No. 6,058,363, filed Dec. 29, 1997, and issued May 2, 2000, entitled “Method and System for Speaker-Independent Recognition of User-Defined Phrases,” comprises enrolling a user-defined phrase with a set of speaker-independent recognition models using an enrollment grammar. An enrollment grammar score of the spoken phrase may be determined by comparing features of the spoken phrase to the speaker-independent recognition models using the enrollment grammar.
Gainsboro U.S. Pat. No. 6,064,963, filed Dec. 17, 1997, and issued May 16, 2000, entitled “Automatic Key Word or Phrase Speech Recognition for the Corrections Industry,” describes an automatic speech recognition (ASR) apparatus integrated into a call control system such that the ASR apparatus identifies key words in real-time or from a recording. The system is particularly applicable to the corrections industry for the purpose of spotting key words or phrases for investigative purposes or inmate control purposes which then can alert or trigger remedial action.
In Sherwood et al. U.S. Pat. No. 6,163,768, filed Jun. 15, 1998, and issued Dec. 19, 2000, entitled “Non-Interactive Enrollment in Speech Recognition,” a computer enrolls a user in a speech recognition system by obtaining data representing a user's speech, the speech including multiple user utterances and generally corresponding to an enrollment text, and analyzing acoustic content of data corresponding to a user utterance. The computer determines, based on the analysis, whether the user utterance matches a portion of the enrollment text.
None of these patents, however, describes a system or method for using automatic speech recognition to analyze a voice interaction and verify compliance of an agent reading from a script to a client during the voice interaction. Further, none of these patents describes a system or method for using automatic speech recognition to provide a quality assurance tool or for any other purpose in a call center environment.