The invention relates to speaker identification.
Speaker identification systems identify a person by analyzing the person's speech. In general, there are three kinds of speaker identification: speaker verification, closed set identification, and open set identification.
A speaker verification system compares a sample of speech from a person who professes to be a particular known speaker to previous samples (or models) of speech of that known speaker to verify the identity of the speaker (by determining whether the sample matches the previous samples).
A closed set identification system analyzes a sample of speech in relation to the speech of each of a set of known speakers. The system then determines that the speech was produced by the known speaker whose speech most closely matches the sample of speech. Thus, a closed set identification system identifies the single known speaker who is mostly likely to have produced the sample of speech.
An open set identification system analyzes a sample of speech in relation to the speech of each of a set of known speakers. The system determines for each known speaker whether the sample of speech was likely to have come from that speaker. The system may determine that the sample of speech was likely to have come from multiple speakers.
In one approach to speaker identification, speech recognition is used to identify the words spoken by the person as the first step in the identification process. Speech recognition systems analyze a person's speech to determine what the person said. Typically, a processor divides a signal that represents the speech into a series of digital frames that each represent a small time increment of the speech. The processor then compares the digital frames to a set of speech models. Each speech model may represent a word from a vocabulary of words, and may represent how that word is spoken by a variety of speakers. A speech model also may represent a sound, or phoneme, that corresponds to a portion of a word. The processor determines what the person said by finding the speech models that correspond best to the digital frames that represent the person's speech. Speech recognition is discussed in U.S. Pat. No. 4,805,218, entitled "Method for Speech Analysis and Speech Recognition", which is incorporated by reference.
After using speech recognition to determine the content of the speech, the speaker identification system determines the source of the speech by comparing the recognized speech to stored samples of speech produced by different known speakers. The stored samples of speech may be produced, for example, by having a known speaker read from a list of words, or by having the known speaker respond to prompts that ask the speaker to recite certain words. As the known speaker reads from the list of words or responds to the prompts, the known speaker's speech is sampled and the samples are stored along with the identity of the known speaker.