This invention relates generally to a speaker verification system and more particularly, to an apparatus and method for passive voice monitoring in a telephone network.
Long distance credit card services must identify a user to ensure that an impostor does not use the service under another person's identity. It has been estimated that the aggregate losses to the long distance services due to unauthorized use is in the one to four billion dollar range. Because of the magnitude of these losses, telephone companies are investigating methods of verifying the identity of the caller each time a call is placed. Prior art systems typically provide a lengthy identification number (calling card number) which must be entered via the phone's keypad to initiate the long distance service. Unfortunately, this approach is prone to abuse because the identification number may be easily appropriated by theft, or by simply observing the entry of the identification number by another. Accordingly, a biometric technique, as opposed to a method based solely on the knowledge of a password or possession of a key, is preferable. Voice is an ideal medium because every consumer already has the required equipment, a telephone.
A number of recognition techniques have been proposed for identifying a speaker on the basis of prerecorded samples of his speech. As is known in the art, it is possible to represent a voice pattern with a sequence of P-dimensional feature vectors. In accordance with the pattern to be represented, the number P may be from 1 to 10 or more. Speech utterances may be represented as collections of these vectors. In certain conventional speaker verification systems, the password speech pattern uttered by a registered speaker is stored as a reference pattern, and at the time of verification, a code specifying the speaker (hereinafter the "registered speech number") and the password spoken by a speaker to be verified are input. The reference pattern specified by the registered speaker number and the uttered speech pattern of the password (hereinafter the "input pattern") are compared with each other to calculate an evaluation value of dissimilarity therebetween. If the dissimilarity is smaller than a predetermined threshold value, the speaker is recognized as the registered person, and if the dissimilarity is greater, the speaker is judged to be an imposter.
Voice verification methods currently being tested by telephone companies prompt the user to speak one or more predetermined, short authorization phrases before a connection is made with the called party. The interactive session in which phrases are prompted and spoken takes about ten seconds. Even without considering the cost of such systems themselves, any savings in fraudulent charges realized by such voice verification systems may be easily offset or negated by other costs associated therewith. Such costs include the additional telephone line connection charges, the additional time consumers must spend to make a call, and the loss of business due to false rejections. Moreover, prior art speaker verification systems have not provided the necessary discrimination between true speakers and impostors to be commercially acceptable in applications where the speaking environment is unfavorable.
Speaker verification over long distance telephone networks present challenges not previously overcome. Variations in handset microphones result in severe mismatches between speech data collected from different handsets for the same speaker. Further, the telephone channels introduce signal distortions which reduce the accuracy of the speaker verification system. Also, there is little control over the speaking conditions. Finally, the need to prompt the customer to recite a predetermined speech sample and pre-analyze the same associated with prior art techniques imposes additional costs in the form of additional telephone line connection charges and the additional time customers must spend to place a call.
Accordingly, a need exists for a voice verification system to prevent calling card abuse over telephone lines. Further, a need has arisen to provide a speaker verification system which effectively, yet passively, discriminates between true speakers and impostors, particularly in a long distance network setting.