Current voice identification systems are limited in their ability to efficiently process audio from environments that include high noise content and multiple speakers. Consider the following prior art.
U.S. Pat. No. 4,852,170, entitled “Real Time Computer Speech Recognition System,” discloses a system that determines the frequency content of successive segments of speech. While U.S. Pat. No. 4,852,170 disclosed a system that is capable of analyzing speech digitally, the present invention offers a system that allows speech to be analyzed digitally in a way that is distinguishable from the device taught in U.S. Pat. No. 4,852,170. U.S. Pat. No. 4,852,170 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,758,023, entitled “Multi-Language Speech Recognition System,” discloses a system that considers the frequency spectrum of speech. While U.S. Pat. No. 4,852,170 disclosed a system that is capable of analyzing speech digitally, the present invention offers a system that allows speech to be analyzed digitally in a way that is distinguishable from the device taught in U.S. Pat. No. 5,758,023. U.S. Pat. No. 5,758,023 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 6,067,517, entitled “Transcription of Speech Data with Segments from Acoustically Dissimilar Environments,” discloses a technique for improving recognition accuracy when transcribing speech data that contains data from a variety of environments. U.S. Pat. No. 6,067,517 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 7,319,959, entitled “Multi-Source Phoneme Classification for Noise-Robust Automatic Speech Recognition,” discloses a system and method for processing an audio signal by segmenting the signal into streams and analyzing each stream to determine phoneme-level classification. U.S. Pat. No. 7,319,959 is hereby incorporated by reference into the specification of the present invention.
While some of the prior art teaches methods of identifying multiple speakers, these methods often employ large, costly software programs that require substantial unique computing resources. Additionally, the known prior art fails to disclose an efficient and cost-effective way to identify multiple voices, in parallel, and identify the time periods that each such multiplicity of voices is present within the digitized audio that is being processed.