This invention relates to a speaker recognition system and more particularly to a system which is capable of identifying an unknown talker or speaker as being one of a finite number of speakers.
As one will understand, the art of speech recognition in general has been vastly developed within the last few years and speech recognition systems have been employed in many forms. The concept of recognizing speech recognizes that the information obtained in the spoken sound can be utilized directly to activate a computer or other means.
Essentially, the prior art understood that a key element in recognizing information in a spoken sound is the distribution of the energy with frequency. The format frequencies which are those at which the energy peaks are particularly important. The format frequencies are the acoustic resonances of the mouth cavity and are controlled by the tongue, jaw and lips. For a human listener the termination of the first two or three format frequencies is usually enough to characterize the sound. In this manner machine recognizers of the prior art included some means of determining the amplitude spectrum of the incoming speech signal. This first step in speech recognition is referred to as preprocessing as it transforms the speech signal into features or parameters that are recognizable and reduces the data flow to manageable proportions.
In regard to such, one means of accomplishing this is the measurement of the zero crossing rate of the signal in several broad frequency bands to give an estimate of the format frequencies in these bands. Another means is representing the speech signal in terms of the parameters of the filter whose spectrum best fits that of the input speech signal. This technique is known as linear predictive coding (LPC). Linear predictive coding or LPC has gained popularity because of its efficiency, accuracy and simplicity. The recognition features extracted from speech are typically averaged over 10 to 20 milliseconds then sampled 50 to 100 times per second.
At this point, the data which is digitized and the ensuing recognition steps are performed by a programmable digital processor. In any event, there are many problems associated with the concept of recognizing speech in regard to the information content. In any event, as one can ascertain, the general problem of speech recognition has been described in many articles and patents. Apart from the problem of recognizing speech in general, another major concern is to recognize or verify a speaker. Speaker recognition is a generic term which refers to a system which discriminates between speakers according to their voice characteristics. Speaker recognition can involve speaker identification or speaker verification. Speaker identification is a system which can classify an unlabeled voice as belonging to one of a set of N reference speakers. Speaker verification implies the determination that an unlabeled voice belongs to a specific reference speaker. For a description of both speaker recognition systems and speech recognition system reference is made to the November, 1985 issue of the Proceedings of the I.E.E.E., Volume 73, No. 11, pages 1537-1696. In particular an article entitled "Speaker Recognition-Indentifying People By Their Voices", by G. R. Doddington. See also Linear Prediction of Speech, Spring-Verlag (1976) by J. D. Markal and A. H. Gray for additional background. In this respect a system which can identify unknown speakers in real time using a small sample of their speech has great applicability.
Essentially, the applicability or usefulness of such a system should be apparent in regard to military systems whereby only authorized or identified speakers would be allowed to communicate with certain other authorized or identified individuals. In such a system an operator will be able to specify those speakers who are of interest at a particular time. Such a system could then route to the operator only speech that it identifies as spoken by specified talkers.
Such systems may also be used in security applications as recognizing certain individual's voices to gain access to premises, identification and so on. Essentially, as one can ascertain, any such system prior to executing a recognition task will have to obtain samples of the speech from each of the talkers that may later be recognized.
A major aspect or specification for any such system is that it shall correctly identify speakers whose training data has been preprocessed and using a small percentage of time in order to accomplish such recognition. Thus in regard to any such system it is immediately ascertained that there is application for speaker recognition in many different systems that attempt to identify the users of the system by their voices. In certain applications a system which can identify particular speakers would identify current speakers which are using a communications channel and therefore selectively route speech from selected authorized talkers to the user.
In this manner the system will serve to automatically identify and recognize individual speakers and to therefore under certain considerations either indicate that the speaker is authorized to use a certain communication channel or that the speaker is one whose presence in a conference or conversation is authorized. Hence as one can ascertain, there are many uses for speaker recognition systems which presently exist. As one can also ascertain, the problems of individual speaker recognition is a substantial problem and while there have been many attempts to achieve such in the prior art, none of these attempts have been successful in that such systems have been extremely complicated and are associated with low accuracy.
It is therefore an object of the present invention to provide an improved multiple parameter speaker recognition system which system exhibits a high accuracy and which system is capable of identifying any one of a plurality of finite authorized speakers to thereby afford speaker recognition to authorized system users.
A further object of this invention is to provide apparatus and methods used to identify an unknown talker as one of a finite number of speakers. The apparatus and methods allow the speaker to be modeled and recognized with any examples of their speech as the speakers do not have to repeat a particular phrase in order to achieve recognition.
Hence a further object of the present invention is to therefore provide a text independent speaker recognition system.