The present invention is directed to a system and method for processing signals for signature detection. More specifically, the system and method are directed to the processing of unconstrained acoustic signals such as audible speech and other sounds emitted by various sources. In certain embodiments and applications, the system and method provide for such processing in context-agnostic manner to distinguish the sources for identification and classification purposes. In certain speech applications, for instance, the subject system and method provide for the identification and classification of speech segments and/or speakers in context-agnostic manner.
Exemplary embodiments of the present invention utilize certain aspects of methods and systems previously disclosed in U.S. patent application Ser. No. 10/748,182, (now U.S. Pat. No. 7,079,986) entitled “Greedy Adaptive Signature Discrimination System and Method” referred to herein as reference [1], as well as certain aspects of methods and systems previously disclosed in U.S. patent application Ser. No. 11/387,034, entitled “System and Method For Acoustic Signature Extraction, Detection, Discrimination, and Localization” referred to herein as reference [2]. This techniques and measures disclosed by these references are collectively and generally referred to herein as [GAD].
Autonomous machine organization of collections of natural speech has proven to be a difficult problem to address. The challenge of selecting a robust feature space is complicated by variations in the words spoken, recording conditions, background noise, etc. Yet the human ear is remarkably adept at recognizing and clustering speakers. Human listeners effortlessly distinguish unknown voices in a recorded conversation and can generally decide if two speech segments come from the same speaker with only a few seconds of exposure. Human listeners can often make this distinction even in cases where they are not natively familiar with the speaker's language or accent.
Both voice recognition and voice-print biometric technologies are comparatively well developed. Hence, many researchers have addressed the problem of sorting natural speech by applying voice recognition to capture key phonemes or words, then attempting to establish a signature for each speaker's pronunciation of these key words. This is a natural approach to engineering a system from component parts; however, it is limited by language, accents, speaking conditions, and probability of hitting key signature words.
Attempts at using these and other technologies to even approach, much less exceed, the human ear's capability to distinguish different speakers from their speech samples alone have proven to be woefully lacking. This is especially so, where the speech samples are unconstrained by any cooperative restrictions, and the speaker is to be distinguished without regard to the language or other substantive content of the speech. There is therefore a need to provide a system and method for use in speech and other applications, whereby the source of unconstrained acoustic signals may be accurately distinguished from those signals in context-agnostic manner.