Computer implemented speech recognition (SR), sometimes referred to as “speech to text” (STT), includes the recognition and translation of audio data inputs into spoken words by applications running (executing) in a computer system environment. Common speech recognition applications include voice user interfaces that enable “voice dialing” by recognizing key words (for example, “Call home”) spoken within the audio input and using data associated with the speaker to execute the associated task. Thus, to recognize the speaker (for example, an account holder using a particular cellular phone), look up a telephone number that the user or a contact file indicates is a “home” number of the identified user, and execute a telephone call on the cellular device to the looked-up number.
A variety of approaches are used to successfully detect or recognize and translate audio data inputs into constituent text words or concepts. Some SR systems and applications use “speaker-independent speech recognition,” while others use “training” where an individual speaker reads sections of text into an SR system that analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in more accurate transcription.