1. Field of the Invention
The present invention relates in general to computer speech recognition systems and, in particular, to a system and method for expediting the aural training of an automated speech recognition program.
2. Background Art
Speech recognition programs are well known in the art. While these programs are ultimately useful in automatically converting speech into text, many users are dissuaded from using these programs because they require each user to spend a significant amount of time training the system. Usually this training begins by having each user read a series of pre-selected materials for several minutes. Then, as the user continues to use the program, as words are improperly transcribed the user is expected to stop and train the program as to the intended word thus advancing the ultimate accuracy of the speech files. Unfortunately, most professionals (doctors, dentists, veterinarians, lawyers) and business executive are unwilling to spend the time developing the necessary speech files to truly benefit from the automated transcription.
Accordingly, it is an object of the present invention to provide a system that offers expedited training of speech recognition programs. It is an associated object to provide a simplified means for providing verbatim text files for training the aural parameters (i.e. speech files, acoustic model and/or language model) of a speech recognition portion of the system.
Another object of the present invention is to provide a system that can increase the speed of the speech recognition training by training the speech recognition software with only the segments of transcribed speech that are determined to be erroneous.
It is an associated object of the present invention to provide a system that can recognize segments of text that require correction without the need to run speech recognition software in the background.
These and other objects will be apparent to those of ordinary skill in the art having the present drawings, specification and claims before them.
The present invention relates to a system for improving the accuracy of a speech recognition program. The system includes means for automatically converting a pre-recorded audio file into a written text. The system also includes means for parsing the written text into segments and for correcting each and every segment of the written text. In a preferred embodiment, a human speech trainer is presented with the text and associated audio for each and every segment. The segments that are ultimately modified by the human speech trainer are stored in a retrievable manner in association with the computer. The system further includes means for saving speech files associated with a substantially corrected written text and used by the speech recognition program towards improving accuracy in speech-to-text conversion. The system finally includes means for repetitively establishing an independent instance of the written text from the pre-recorded audio file using the speech recognition program and for replacing those segments that required correction in the independent instance of the written text with the corrected segments associated therewith.
In the preferred embodiment of the invention the means for parsing the written text into segments includes means for directly accessing the functions of the speech recognition program. The parsing means may include means to determine the character count to the beginning of the segment and means for determining the character count to the end of the segment. Such parsing means may further include the UtteranceBegin function of Dragon Naturally Speaking(trademark) to determine the character count to the beginning of the segment and the UtteranceEnd function of Dragon Naturally Speaking(trademark) to determine the character count to the end of the segment.
The means for automatically converting a pre-recorded audio file into a written text may further be accomplished by executing functions of Dragon Naturally Speaking(trademark). The means for automatically converting may include the TranscribeFile function of Dragon Naturally Speaking(trademark).
In one embodiment, the correcting means further includes means for highlighting likely errors in the written text. In such an embodiment, where the written text is at least temporarily synchronized to said pre-recorded audio file, the highlighting means further includes means for sequentially comparing a copy of the written text with a second written text resulting in a sequential list of unmatched words culled from the written text and means for incrementally searching for the current unmatched word contemporaneously within a first buffer associated with the speech recognition program containing the written text and a second buffer associated with a sequential list of possible errors. Such element further includes means for correcting the current unmatched word in the second buffer.
In one embodiment, the correcting means includes means for displaying the current unmatched word in a manner substantially visually isolated from other text in the written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word. The correcting means may further include means for alternatively viewing the current unmatched word in context within the copy of the written text.
The second written text may be established by a second speech recognition program having at least one conversion variable different from said speech recognition program. Alternatively, the second written text may be established by one or more human beings.
The invention further involves a method for improving the accuracy of a speech recognition program operating on a computer comprising: (a) automatically converting a pre-recorded audio file into a written text; (b) parsing the written text into segments; (c) correcting each and every segment of the written text; (d) saving the corrected segments in a retrievable manner; (e) saving speech files associated with a substantially corrected written text and used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; (f) establishing an independent instance of the written text from the pre-recorded audio file using the speech recognition program; (g) replacing erroneous segments in the independent instance of the written text with the individually retrievable saved corrected segment associated therewith; (h) saving speech files associated with the independent instance of the written text used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; and (i) repeating steps (f) through (i) a predetermined number of times.