1. Technical Field
This disclosure relates to systems and methods for automatic classification of acoustic (sound) sources, including text-independent speaker identification.
2. Related Art
There are several fields of research studying acoustic signal classification. Each field of research has adopted its own approaches to acoustic signal classification, with some overlap between them. At present, the main applications for automatic sound source classification are: speaker verification; speaker identification; passive sonar classification; and machine noise monitoring or diagnostics.
Speaker verification aims at verifying that a given speaker is indeed who he or she claims to be. In most speaker verification systems, a speaker cooperates in saying a keyword, and the system matches the way that keyword was said by the putative speaker with training samples of the same keywords. If the match is poor, the speaker is rejected or denied service (e.g., computer or premise access). A disadvantage of such methods is that the same keyword must be used at testing time as at training time, thus limiting the application of such methods to access control. This method could not be used to label the speakers in a back-and-forth conversation, for example.
Speaker identification aims at determining which among a set of voices best matches a given test utterance. Text-independent speaker identification tries to make such a determination without the use of particular keywords.
Passive sonar classification involves identifying a vessel according to the sound it radiates underwater. Machine noise monitoring and diagnostics involves determining the state of a piece of machinery through the sound it makes.
In all of the above applications, a model of each sound source is first obtained by training a system with a set of example sounds from each source. A test sample is then compared to the stored models to determine a sound source category for the test sample. Known methods require relatively long training times and testing samples that make such methods inappropriate in many cases. Further, such methods tend to require a large amount of memory storage and computational resources. Finally, these methods often are not robust to the presence of noise in the test signal, which prevents their use in many tasks. (“Signal” means a signal of interest; background and distracting sounds are referred to as “noise”).
Therefore a need exists to classify a noisy acoustic signal while requiring a minimum amount of training and testing.