In the environment of telecommunications systems there has been a steady trend toward automating what was originally operator assistance traffic. Much current activity is directed to responding to directory assistance calls by processing voice frequency instructions from the caller without operator intervention. The instructions are used by an automatic speech recognition unit to generate data signals corresponding to recognized voice frequency signals. The data signals are then used to search a database for a directory listing to derive the desired directory number. A system of this type is described in U.S. Pat. No. 4,979,206 issued Dec. 18, 1990.
According to that patent such automated service is supplied by a switching system equipped with an automatic speech recognition facility for interpreting a spoken or keyed customer request comprising data for identifying a directory listing. In response to recognition of data conveyed by the request, the system searches a database to locate the directory number listing corresponding to the request. This listing is then automatically announced to the requesting customer. In implementing this system the calling customer or caller receives a prompting announcement requesting that the caller provide the zip code or spell the name of the community of the desired directory number. The caller is also prompted to spell the last name of the customer corresponding to the desired directory number. If further data is required, the caller may be prompted to spell the first name and street address of the desired party. Following responses to prompting announcements a search is made to determine if only one listing corresponds to the data supplied by the caller. When this occurs the directory number is announced to the caller. The aim of such a system has been to require a minimum of speech recognition capability by the speech recognition facility--namely, only letters of the alphabet and numbers.
A typical public switched telephone network (PSTN) arrangement proposed to effect such a system is illustrated in block diagram form in FIG. 1 of the aforementioned patent (PRIOR ART). The network of FIG. 1 is here described in some detail as a typical environment in which the method and apparatus of the invention may be utilized. In FIG. 1 block 1 represents a telecommunications switching system, or switch operating under stored program control. Switch 1 may be a switch such as the 5ESS switch manufactured by AT&T Technologies, Inc., arranged to offer the Operator Services Position System (OSPS) features.
Shown within switch 1 are various blocks for carrying out the functions of a program controlled switch. Control 10 is a distributed control system operating under the control of a group of data and call processing programs to control various sections or elements of switch 1. Element 12 is a voice and data switching network capable of switching voice and/or data between inputs connected to that switching network, frequently referred to as the switch fabric or network. Connected to network 12 is a Voice Processing Unit (VPU) 14. Network 12 and VPU 14 operate under the control of control 10. Trunks 31 and 33, customer line 44, data link 35, and operator access facility 26 are connected to network 12 at input ports 31a, 33a, 44a, 35a, and 26a respectively, and control 10 is connected to network 12 via data channel 11 at input port 11a.
VPU 14 receives speech or customer keyed information from callers at calling terminals 40 or 42 and processes the voice signals or keyed tone signals from a customer station using well known automatic speech recognition techniques to generate data corresponding to the speech or keyed information. These data are used by Directory Assistance Computers (DAS/C) 56 in making a search for a desired telephone or directory number listing. When a directory assistance request comes from a customer terminal 42 via customer line 44, port 44a and switching network 12 to VPU 14, VPU 14 analyzes voice input signals to recognize individual ones of various elements corresponding to a predetermined list of spoken responses.
VPU 14 also generates voice messages or announcements to prompt a caller to speak information into the system for subsequent recognition by the voice processing unit. VPU 14 generates output data signals, representing the results of the voice processing. These output signals are sent to control 10 whence they may be transmitted via data link 59 to DAS/C computer 56, or be used within control 10 as an input to the program of control 10 for controlling establishment of connections in switching network 12 or requesting further announcements by VPU 14. VPU 14 includes announcement circuits 13 and detection circuits, i.e., automatic speech recognition circuits 15 both controlled by a controller of VPU 14. A Conversant 1 Voice System, Model 80, manufactured by AT&T Technologies, Inc., may be used to carry out the functions of the VPU 14.
When the DAS/C computer 56 completes its data search and locates the requested directory listing, it is connected via data link 58 to an Audio Response Unit (ARU) 60, which is connected to the voice and data switching network 12 for announcing the telephone number of an identified telephone listing. Computer Consoles, Inc. (CCI) manufactures an Audio Response Unit 16 and the DAS/C terminal 52 which may be used in this environment. As shown, the DAS/C computer 56 is directly connected to control 10 by data link 59 but could be connected to control 10 via a link to network 12 and a connection through network 12 via port 11a. After a directory listing is found the directory number is reported to audio response unit 60 for announcement to the caller.
Directory assistance calls can also be processed with the help of an operator if the VPU fails to recognize adequate oral information.
Connected to switch 1 are trunks 31 and 33 connected to local switch 30 and interconnection network 32. Local switch 30 is connected to calling customer terminal 40 and interconnection network 32 is connected to a called customer terminal 46. Switch 30 and network 32 connect customer terminal signals from customer terminals to switch 1. Also connected to switch 1 are customer lines including customer line 44 for connecting a customer terminal 42 to switch 1.
In an alternate connection calling terminal 40 is connected via local switch 30 to switch 1. In a more general case, other switches forming part of a larger public telephone network such as interconnection network 32 would be required to connect calling terminal 40 to switch 1. Generally speaking, calls are connected to switch 1 via communication links such as trunks 31 and 33 and customer line 44. In the alternate connection calling terminal 40 is connected by a customer line to a 1AESS 30, manufactured by AT&T Technologies, Inc., and used here as a local switch or end office. That switch is connected to trunk 31 which is connected to switch 1. Local switch 30 is also connected to switch 1 by a data link 35 used for conveying common channel signaling messages between these two switches. Such common channel signaling messages are used herein to request switch 30 to initiate the setting up of a connection, for example, between customer terminals 40 and 46. Switch 1 is connected in the example terminating connection to called terminal 46 via interconnection network 32. If the calling terminal is not directly connected to switch 1, the directory number of the calling terminal identified, for example, by Automatic Number Identification (ANI), is transmitted from the switch connected to the calling terminal to switch one.
Operator position terminal 24 connected to switch 1 comprises a terminal for use by an operator in order to provide operator assistance. Data displays for the operator position terminal 24 are generated by control 10. Operator position terminal 24 is connected to switching network 12 by operator access facility 26 which may include carrier facilities to allow the operator position to be located far from switching network 12 or may be a simple voice and data access facility if the operator positions are located close to the switching network.
In order to handle directory assistance services, the directory assistance operator has access to two separate operator terminals; terminal 24 for communicating with the caller and switch 1 and terminal 52 used for communicating via data link 54 with DAS/C computer 56. The operator at terminals 24 and 52 communicates orally with a caller and on the basis of these communications keys information into the DAS/C terminal 52 for transmission to the DAS/C computer 56. The DAS/C computer 56 responds to such keyed information by generating displays of information on DAS/C terminal 52 which information may include the desired directory number. Until the caller provides sufficient information to locate a valid listing the caller is not connected to an audio response unit since there is nothing to announce. Further details of the operation of the system of FIG. 1 are set forth in U.S. Pat. No. 4,979,206.
Further examples of use of voice recognition in automation of telephone operator assistance calls is found in U.S. Pat. Nos. 5,163,083, issued Nov. 10, 1992; 5,185,781, issued Feb. 9, 1993; 5,181,237, issued Jan. 19, 1993, to Dowden et al.
Another proposed use for speech recognition in a telecommunications network is voice verification. This is the process of verifying the person's claimed identity by analyzing a sample of that person's voice. This form of security is based on the premise that each person can be uniquely identified by his or her voice. The degree of security afforded by a verification technique depends on how well the verification algorithm discriminates the voice of an authorized user from all unauthorized users. It would be desirable to use voice verification to verify the identity of a telephone caller. Such schemes to date, however, have not been implemented in a fully satisfactory manner. One such proposal for implementing voice verification is described in U.S. Pat. No. 5,297,194, issued Mar. 22, 1994, to Hunt et al. In an embodiment of such a system described in this patent a caller attempting to obtain access to services via a telephone network is prompted to enter a spoken password having a plurality of digits. Preferably, the caller is prompted to speak the password beginning with the first digit and ending with a last digit. Each spoken digit of the password is then recognized using a speaker-independent voice recognition algorithm. Following entry of the last digit of the password, a determination is made whether the password is valid. If so, the caller's identity is verified using a voice verification algorithm.
This method is implemented according to that patent using a system comprising a digital processor for prompting the caller to speak the password and then using speech processing means controlled by the digital processor for effecting a multi-stage data reduction process and generating resulting voice recognition and voice verification parameter data and voice recognition and verification routines.
Following the digit based voice recognition step, the voice verification routing is controlled by the digital processor and is responsive to a determination that the password is valid for determining whether the caller is an authorized user. This routing includes transformation means that receives the speech feature data generated for each digit in the voice verification feature transformation data and in response thereto generates voice verification parameter data for each digit. A verifier routing receives the voice verification parameter data and the speaker-relative voice verification class reference data and in response thereto generates an output indicating whether the caller is an authorized user.
In operation a caller places a call from a conventional calling station telephone to a financial institution or card verification company in order to access account information. The caller has previously enrolled in the voice verification database that includes his or her voice verification class reference data. The financial institution includes suitable input/output devices connected to the system (or integrally therewith) to interface signals to and from the telephone lines. Once the call set up has been established, the digital processor controls the prompt means to prompt the caller to begin digit-by-digit entry of the caller's preassigned password. The voice recognition algorithm processes each digit and uses a statistical recognition strategy to determine which digit (0-9 and "oh") is spoken. After all digits have been recognized, a test is made to determine whether the entered password is valid for the system. If so, the caller is conditionally accepted. In other words, if the password is valid the system "knows" who the caller claims to be and where the account information is stored.
Thereafter the system performs voice verification on the caller to determine if the entered password has been spoken by a voice previously enrolled in the voice verification reference database and assigned to the entered password. If the verification algorithm establishes a "match" access to the data is provided. If the algorithm substantially matches the voice to the stored version thereof but not within a predetermined acceptance criterion, the system prompts the caller to input additional personal information to further test the identity of the claimed owner of the password. If the caller cannot provide such information, the system rejects the access inquiry and the call is terminated.
Existing approaches for deploying speech recognition technology for universal application are based on creating speech models based on "average" voice features. This averaging approach tends to exclude persons with voice characteristics beyond the boundaries created by the averaging. The speech model averages are based on the training set used when the models are created. For example, if the models are created using speech samples for New Englanders then the models will tend to exclude voices with Southern accents or voices with Hispanic accents. If the models try to average an all inclusive population, the performance deteriorates for the entire spectrum.