Speech recognition systems, particularly computer-based speech recognition systems, are well known. Numerous inventions and voice transcription technologies have been developed to address various problems within speech recognition systems. In one aspect, advanced mathematics and processing algorithms have been developed to address the needs of translating vocal input into computer text through speech parsing, phoneme identification and database matching of the input speech so as to accurately transcribe the speech into text.
General speech recognition databases are also well known. U.S. Pat. No. 6,631,348 (Wymore), for example, discloses a speech recognition system in which vocal training information is provided to create different vocal reference patterns under different ambient noise levels. The Wymore invention creates a database of captured speech from this training input. During operation, a user of the Wymore system may then dictate speech under various ambient noise conditions and the speech recognition system properly filters the noise from the user's input speech based on the different stored models to determine the appropriate, spoken words, thereby improving the accuracy of the speech transcription.
U.S. Pat. No. 6,662,160 (Chien et al.) also discloses a system involving adaptive speech recognition methods that include noise compensation. Like Wymore, the system of Chien et al. neutralizes noise associated with input speech through the use of preprocessed training input. Chien et al. employs complex statistical mathematical models (e.g. Hidden Markov Models) and applies optimal equalization factors in connection with feature vectors and probability density functions related to various speech models so as to accurately recognize a user's speech.
Other voice transcription systems address the problems of minimizing and correcting misrecognition errors. For example, U.S. Pat. No. 6,195,637 (Ballard at al.) discloses a transcription system that accepts a user's dictation and contemporaneously allows a user to mark misrecognized words during the dictation. At the conclusion of dictation, a computer-based, textual correction tool is invoked with which the user may correct the marked, misrecognized words. Numerous, potentially intended words, e.g. words that are close in phonetic distance to the actual speech, are provided by the Ballard at al. system for possible replacement of the misrecognized word. Other examples of misrecognized words include incorrectly spelled words and improperly formatted words, (e.g. lack of upper case, letters in a name or incorrect punctuation). In one embodiment, Ballard at al. discloses a computer having a windows-based, graphical user interface that displays the list of potentially intended words from which the user selects the appropriate word with a graphical input device, such as a computer mouse.
Other existing speech recognition systems deal with problems associated with large, speech recognition vocabularies, i.e. the entire English language. These systems typically address the allocation of the computer-based resources required to solve the speech recognition problems associated with such a vocabulary. U.S. Pat. No. 6,490,557 (Jeppesen), for example, discloses a system and method for recognizing and transcribing continuous speech in real time. In one embodiment, the disclosed speech recognition system includes multiple, geographically distributed, computer systems connected by high speed links. A portion of the disclosed computer system is responsible for preprocessing continuous speech input, such as filtering any background noise provided during the speech input, and subsequently converting the resultant speech signals into digital format. The digital signals are then transcribed into word lists upon which automatic speech recognition components operate. Jeppeson's speech recognition system is also trainable so as to accommodate more than one type of voice input, including vocal input containing different accents and dialects. Thus, this speech recognition system is capable of recognizing large vocabulary, continuous speech input in a consistent and reliable manner, particularly, speech that involves variable input rates and different dialects and accents. Jeppesen further discloses systems having on-site data storage (at the site of the speech input) and off-site data storage which stores the databases of transcribed words. Thus, in one aspect, a primary advantage of Jeppesen is that a database of large scale vocabularies containing speech dictations is distributed across different geographical areas such that users employing dialects and accents within a particular country or portion of the world would be able to use localized databases to accurately transcribe their speech input.
Other large vocabulary speech recognition systems are directed to improving the recognition of dictated input through the use of specialized, hierarchically arranged, vocabularies. The computerized, speech recognition system of U.S. Pat. No. 6,526,380 (Thelan et al.), for example, employs a plurality of speech recognition models that accept incoming speech in parallel and attempts to match the speech input within specific databases. Since the English language vocabulary, for example, is relatively large, the speech matching success rate using such a large vocabulary for any given particular dictation may be lower than what is acceptable for a particular application. Thelan et al. attempts to solve this problem through the use of specific vocabularies selected by the voice recognition modules after a particular speech vocabulary and associated text database is determined to be more appropriately suited to the dictation at issue. Thus, Thelan et al. begins with an ultra-large vocabulary and narrows the text selection vocabularies depending on the speech input so as to select further refined vocabularies that provide greater transcription accuracy. Model selectors are operative within Thelan et al. to enable the recognition of more specific models if the specific models obtain good recognition results. These specific models may then be used as replacement for the more generic vocabulary model. As with Jeppesen, Thelan et al. discloses computer-based speech recognition system having potentially distributed vocabulary databases.
Heretofore, no computerized speech recognition systems have been developed that take advantage of repeated dictation of specific terms into specific form fields or repeated dictation of specific terms by specific persons. In particular, context-specific vocabularies or context-specific modifications of matching probabilities have not been provided with respect a context specific vocabulary which is used on conjunction with more general vocabularies. The modern necessity of using specific, computerized, form-based input creates a unique problem in that the general vocabularies used by many of the commercial speech recognition software programs do not provide efficient and accurate recognition and transcription of users' input speech. The limitations of the present systems lie in the fact that any vocabulary large enough to accommodate general as well as specific text will have phonetically similar general text so as to cause an unacceptably high error rate.