Researchers and others have used computers to receive speech in a variety of contexts. For example, computers have been programmed to receive a person's speech and transcribe the speech into an electronic document (i.e., speech-to-text). Speech-to-text programs often require that the speaker read one or more prepared documents to the computer. The computer then aligns the spoken portion with the text to develop a model of the speaker's voice. When a new speaker uses the program, the new speaker must read the prepared documents before the speech-to-text program will work effectively.
Another instance of computerized speech recognition involves a human operator who transcribes a speaker's dialog. The operator then inputs the transcription and a recording of the audio to a computer that then processes the recorded audio in light of the transcription. The speaker may therefore speak spontaneously, but human intervention is required in the form of the transcription.
In general, computerized speech recognition programs utilize a statistical model of a language, such as the English language, based on common words of that language. Computerized speech recognition programs are often constructed to recognize particular words of the language and to ignore other sounds or parts of speech. In this manner, these recognition programs accept recognized words and reject other sounds, such as mumbled speech or a cough or other non-linguistic sound. These speech recognition programs discard disfluencies, such as silent pauses, filled pauses (e.g., “umm” or “ahh”), and false starts, to create a text document that does not include such disfluencies.