The present invention is directed to a system and method for processing signal data for signature detection. More specifically, the system and method are directed to the taxonomic processing of unconstrained signal data captured for/from various sources in numerous applications, such as audible speech and other sounds signals emitted by certain beings, relief data from certain textured surfaces, and image data of certain subjects, among others. In various embodiments and applications, the system and method provide for such processing in context-agnostic manner to distinguish the sources for identification and classification purposes. In various speech applications, for instance, the subject system and method provide for the identification and classification of speech segments and/or speakers in context-agnostic manner.
Exemplary embodiments of the present invention utilize certain aspects of methods and systems previously disclosed in U.S. patent application Ser. No. 10/748,182 (now U.S. Pat. No. 7,079,986), entitled “Greedy Adaptive Signature Discrimination System and Method” referred to herein as reference [1], as well as certain aspects of methods and systems previously disclosed in U.S. patent application Ser. No. 11/387,034 (now U.S. Pat. No. 8,271,200), entitled “System and Method For Acoustic Signature Extraction, Detection, Discrimination, and Localization” referred to herein as reference [2]. This techniques and measures disclosed by these references are collectively and generally referred to herein as [GAD].
Autonomous machine organization of captured signals having unknown source has proven to be a difficult problem to address. One notable example is in the context of natural speech, where the challenge of selecting a robust feature space for collections of speech is complicated by variations in the words spoken, recording conditions, background noise, etc. Yet the human ear is remarkably adept at recognizing and clustering speakers. Human listeners effortlessly distinguish unknown voices in a recorded conversation and can generally decide if two speech segments come from the same speaker with only a few seconds of exposure. Human listeners can often make this distinction even in cases where they are not natively familiar with the speaker's language or accent.
Both voice recognition and voice-print biometric technologies are comparatively well developed. Hence, many researchers have addressed the problem of sorting natural speech by applying voice recognition to capture key phonemes or words, then attempting to establish a signature for each speaker's pronunciation of these key words. This is a natural approach to engineering a system from component parts; however, it is limited by language, accents, speaking conditions, and probability of hitting key signature words.
Attempts at using these and other technologies to even approach, much less exceed, the human ear's capability to distinguish different speakers from their speech samples alone have proven to be woefully lacking. This is especially so, where the speech samples are unconstrained by any cooperative restrictions, and the speaker is to be distinguished without regard to the language or other substantive content of the speech. Similar deficiencies are encountered in other contexts, such as in the identification and classification of geography type from captured terrain mapping data, and in the identification and classification of species from a collection of anatomic image data. There is therefore a need to provide a system and method for use in various applications, whereby the source of certain unconstrained captured signals may be reliably distinguished by taxonomic evaluation of the captured signals in context-agnostic manner.