A variety of tasks are necessary in speech systems. Speech recognition is the problem associated with an automated system listening to speech, regardless of the speaker and determining the words or message that is spoken. Speaker identification is the problem of listening to speech and determining which one of a group of known speakers is generating the speech. For speaker verification, the user says they are a particular person and the system determines if they are indeed that person.
For previous systems, a user entered a password using numeric processing modules and a keypad recognition system whereby a user will be able to gain access to the voice system through a string of keystrokes by selecting a string of various pre-ordained numbers, a code, on the telephonic keypad. The code length may vary, depending on the system configuration. A numeric processing module in the telephonic voice processing system is able to identify the user through such code. Each user of the telephonic voice processing system will have a separate and distinct code which can uniquely identify each user to the system individually. This type of configuration suffers from several well known drawbacks. For example, such systems are not intuitive and require a user to remember a sequence of numerical codes.
More recently, a user gained access to a system using a voice processing and verification system. FIG. 1 shows a conventional voice processing and verification system. Telephone lines 100 are coupled with one or more voice processing modules 101 which each include a voice processing server 102. Each of the voice processing modules 101 are linked to a common memory 103. An incoming telephone call is either from a new user or a current user. In some systems, if the user is new to a system, the user is prompted by the voice processing server 102 to identify that fact to the system by pushing a particular digit on the touchtone telephone keypad. This sends a newuser signal to the voice processing server 102 identifying the caller as a new user to the system. If the voice processing server 102 detects a newuser signal, the user's voice is then recorded by the voice processing server 102, converted to a digital signal, and digitally stored in memory 103. This is sometimes referred to as the enrollment process.
The enrollment process involves taking a sampling of the user's voice taken over a set interval of time. This enrollment and verification process is exemplary only; other processes may be present in the prior art. Telephonic voice processing and verification systems typically involve an enrollment process whereby a new user initially gains entry to the system by recording a model of an enrollment voice sample. This enrollment voice sample may consist of a single word but preferably is a group of words. The model of the enrollment voice sample is digitally processed and recorded in the memory 103. Models of enrollment voice samples are also stored for the other users of the system. A user is then able to gain access to the system on subsequent occasions through a comparison with each of the models of their enrollment voice sample stored in memory 103.
If the user is a current user, and not a new user to the telephonic voice processing system, the user will not enter any digits from his telephone keypad when prompted by the system. The user is first prompted by the voice processing server 102 to identify himself/herself. If known, the user's incoming voice is digitally processed by the voice processing server 102 and stored in a buffer 104. The telephonic voice verification system then compares the stored incoming voice sample with each of the enrollment voice models which are stored in memory 103. If the stored incoming voice signal matches the enrollment voice model retrieved from the memory 103, within a predetermined threshold, the user gains access to the system. If the user is not known to the system, a newuser signal is generated.
Often, in a telephonic voice verification system with multiple users, a comparison may result in a false rejection or false acceptance. A false rejection occurs when the user is denied access to the system when they should be granted access. A false acceptance occurs when the user is allowed access when it should be denied. One common reason for false rejection and false acceptance is caused by variations in the stored incoming voice signal which are attributable to noise and/or signal variations caused by differing telephonic equipment. For example, an enrollment voice model recorded from an initial incoming telephone call made over a carbon button telephone is likely to significantly differ from a subsequent incoming voice signal where the incoming voice signal is from a cellular telephone or an electret telephone.
Common telephone types include carbon button, cellular and electret. Each of these types of telephones introduces a different type of noise or other signal modification. It is well known that users sound different over these different types of telephony equipment. A person receiving a call from another person they know well will recognize differences in the sound of the caller's voice when made from different types of equipment. Such changes to the received signal can cause an automated system to reject a known user. For example, consider a user that provides the enrollment voice sample from a carbon button type phone at their desk. If the same user calls back later from a cellular phone, the user might be rejected because of variances introduced by the equipment differences. This problem could be overcome by changing the threshold levels required for a match in verification; however, such a course of action would lead to increased occurrences of false acceptances. Therefore, what is needed is an improved voice processing and verification system which can account for these variations.