The invention relates to speech recognition.
A speech recognition system analyzes a person's speech to determine what the person said. Most speech recognition systems are frame-based. In a frame-based system, a processor divides a signal descriptive of the speech to be recognized into a series of digital frames, each of which corresponds to a small time increment of the speech. The processor then compares the digital frames to a set of speech models. Each speech model may represent a word from a vocabulary of words, and may represent how that word is spoken by a variety of speakers. A speech model also may represent a sound, or phoneme, that corresponds to a portion of a word. Collectively, the constituent phonemes for a word in the model represent the phonetic spelling of the word.
The processor determines what the speaker said by finding the speech models that best match the digital frames that represent the person's speech. The words or phrases corresponding to the best matching speech models are referred to as recognition candidates. Speech recognition is discussed in U.S. Pat. No. 4,805,218, entitled "METHOD FOR SPEECH ANALYSIS AND SPEECH RECOGNITION," which is incorporated by reference.
FIG. 1 is a block diagram of a system that may be used for speech recognition. The system includes various input/output (I/O) devices (microphone 101, mouse 103, keyboard 105, display 107) and a general purpose computer 100 having a central processor unit (CPU) 121, an I/O unit 117 and a sound card 119. A memory 109 stores data and various programs such as an operating system 111, an application program 113 (e.g., a word processing program), and a speech recognition program 115.
The microphone 101 detects utterances from a speaker and conveys the utterances, in the form of an analog signal, to sound card 119, which in turn passes the signal through an analog-to-digital (A/D) converter to transform the analog signal into a set of digital samples. Under control of the operating system 111, the speech recognition program 115 compares the digital samples to speech models to determine what the speaker said. The results of this determination may be stored for later use or may be used as input to the application program 113.
As shown in FIG. 2, the speech recognition program may run concurrently with an application program--for example, a word processor--to allow the speaker to use the microphone 101 as a text input device either alone or in conjunction with the keyboard 105 and mouse 103. The speaker interacts with the word processor through a graphic user interface (GUI) which includes a window 200 having a text field 202. The speech recognition program also employs a GUI to communicate with the speaker. The GUI shown in FIG. 2 was developed by Dragon Systems, Inc. for the speech recognition program, DragonDictate.RTM. for Windows.RTM.. In FIG. 2, the speech recognition program's GUI is superimposed on the word processor's GUI to provide the speaker with convenient access to both programs.
In the example shown, the speaker has spoken the Preamble of the U.S. Constitution into the microphone. The spoken words are recognized by the speech recognition program and provided as input to the word processor which then displays the corresponding text into the text field 202. In this example, however, the spoken word "States" was recognized incorrectly as "stakes" 208. Using appropriate voice commands (either alone or in conjunction with input from the keyboard or mouse), the speaker may correct the text, for example by designating the second word choice 210, "States," in the word history window 206 as being the correct word.
A speech recognition system may be a "discrete" system--i.e., one which recognizes discrete words or phrases but which requires the speaker to pause briefly between each discrete word or phrase spoken. Alternatively, a speech recognition system may be "continuous," meaning that the recognition software can recognize spoken words or phrases regardless of whether the speaker pauses between them. Continuous speech recognition systems typically have a higher incidence of recognition errors in comparison to discrete recognition systems due to complexities of recognizing continuous speech. A more detailed description of continuous speech recognition is provided in U.S. Pat. No. 5,202,952, entitled "LARGE-VOCABULARY CONTINUOUS SPEECH PREFILTERING AND PROCESSING SYSTEM," which is incorporated by reference.