The field of processing voice signals for use within a computerized device has traditionally been split into two distinct fields, speaker identification, and speech recognition. These two fields have historically required separate and uniquely designed and configured systems. These systems are often provided by different vendors
Speech recognition involves recognizing a human language word spoken by a speaker. In one example, speech recognition is utilized for computerized dictation, where a user speaks into a microphone and her words are recognized and entered into a document. Another example of speech recognition is controlling personal electronics, such as a cellular telephone or car stereo, through the use of verbal commands. Other applications for speech recognition include: command recognition, dictation, interactive voice response systems, automotive speech recognition, medical transcription, pronunciation teaching, automatic translation, and hands-free computing. Speech recognition is typically achieved through comparison characteristic qualities of spoken words, phrases, or sentences to one or more templates. A variety of algorithms are known in the art that allow qualification and/or comparison of speech to templates. These algorithms include: hidden Markov models, neural network-based systems, dynamic time warping based systems, frequency estimation, pattern matching algorithms, matrix representation, decision trees, and knowledge based systems. Some systems will employ a combination of these techniques to achieve higher accuracy rates.
Speaker identification involves the process of identifying or verifying the identity of a specific person based on unique qualities of human speech. Human speech is often referred to as a biometric identification mechanism similar to finger prints or retinal scans. Like fingerprints and retinal scans, every individual has a unique voice print that can be analyzed and matched against known voice prints. Like other biometric identification mechanisms, voice prints can be utilized for verification or identification.
Verification using a voice print is commonly referred to as voice authentication. Voice authentication is achieved in a similar manner to speech recognition: characteristic qualities of spoken words or phrases are compared to one or more templates. However, voice authentication is much more difficult to successfully achieve than speech recognition. First, speech recognition requires a less stringent match between the spoken word and a speech template. All that must be determined is what word was said, not who said that word based on a specific accent, pitch, and tone. Second, speaker identification requires matching the speaker to a much larger number of possibilities, because one person must be identified out of many, not just what word they spoke. Whereas it may be acceptable to take up to several seconds to perform voice authentication, speech recognition must be done at a relatively fast pace in order for an interface to be reasonably useable.
Traditionally, the use of speech for identification purposes versus speech for recognition purposes has been very segmented. While speech authentication requires complex and demanding comparisons, speech recognition requires real-time performance in order to meet user needs. Due to these differing requirements, existing systems (including computer hardware, software, or both) have been limited to performing one of these two functions.
The use of speech to authenticate a user has a variety of advantages over other identification methods. First, like fingerprints or iris scans, every human being has an entirely unique speech pattern that can be quantifiably recognized using existing technology. Second, unlike fingerprints or iris scans, the input to a speaker identification system (the spoken word) may be different every time, even where the speaker is saying the same word. Therefore, unlike other methods of human authentication, speech authentication provides the additional advantage of an ability to prevent multiple uses of the same voice print.
The rise of the computer age has drastically changed the manner in which people interact with each other in both business and personal settings. Along with the rise of the use of technology to conduct everyday life, security concerns with the use of computers have risen dramatically due to identity theft. Identity theft typically occurs where personal information such as bank accounts, social security numbers, passwords, identification numbers . . . etc., or corporate information is accessible when transferred over networks such as the internet, or when personal information or corporate information is entered into a user interface. For typical internet transactions such as consumer purchases, bank account transfers . . . etc, the transaction involves both a business side (back-end) and a customer side (front-end). The customer typically uses a computer, or handheld device such as a Smartphone or Personal Digital Assistant (PDA) to communicate during the transaction. Typically, communications during internet transactions are made very secure by using high security protocols such as Transport Layer Security (TSL) or Secure Socket Layer (SSL). However, when a customer enters in information (before it is transferred) at the front-end side of the transaction, the information is highly vulnerable to theft. In fact, in most cases of identity theft, personal information is stolen from the front-end side of the transaction. Therefore, a need exists to provide an efficient, more secure means of protecting the identity of one who wishes to interact in a secure environment over networks such as the internet. More specifically, a need exists to provide a secure transaction environment in which personal or corporate information is not communicated to the customer front-end in an accessible or repeatable format.