1. Field of the Invention
The present invention relates in general to computer speech recognition systems and, in particular, to a system and method for expediting the aural training of an automated speech recognition program.
2. Background Art
Speech recognition programs are well known in the art. While these programs are ultimately useful in automatically converting speech into text, many users are dissuaded from using these programs because they require each user to spend a significant amount of time training the system. Usually this training begins by having each user read a series of pre-selected materials for approximately 20 minutes. Then, as the user continues to use the program, as words are improperly transcribed the user is expected to stop and train the program as to the intended word thus advancing the ultimate accuracy of the acoustic model. Unfortunately, most professionals (doctors, dentists, veterinarians, lawyers) and business executive are unwilling to spend the time developing the necessary acoustic model to truly benefit from the automated transcription.
Accordingly, it is an object of the present invention to provide a system that offers expedited training of speech recognition programs. It is an associated object to provide a simplified means for providing verbatim text files for training the aural parameters (i.e. speech files, acoustic model and/or language model) of a speech recognition portion of the system. These and other objects will be apparent to those of ordinary skill in the art having the present drawings, specification and claims before them.
The present invention relates to a system for improving the accuracy of a speech recognition program. The system includes means for automatically converting a pre-recorded audio file into a written text. Means for parsing the written text into segments and for correcting each and every segment of the; written text. In a preferred embodiment, a human speech trainer is presented with the text and associated audio for each and every segment. Whether the human speech trainer ultimately modifies a segment or not, each segment (after an opportunity for correction, if necessary) is stored in an individually retrievable manner in association with the computer. The system further includes means for saving speech files associated with a substantially corrected written text and used by the speech recognition program towards improving accuracy in speech-to-text conversion.
The system finally includes means for repetitively establishing an independent instance of the written text from the pre-recorded audio file using the speech recognition program and for replacing each segment in the independent instance of the written text with the individually retrievable saved corrected segment associated therewith.
In one embodiment, the correcting means further includes means for highlighting likely errors in the written text. In such an embodiment, where the written text is at least temporarily synchronized to said pre-recorded audio file, the highlighting means further includes means for sequentially comparing a copy of the written text with a second written text resulting in a sequential list of unmatched words culled from the written text and means for incrementally searching for the current unmatched word contemporaneously within a first buffer associated with the speech recognition program containing the written text and a second buffer associated with a sequential list of possible errors. Such element further includes means for correcting the current unmatched word in the second buffer. In one embodiment, the correcting means includes means for displaying the current unmatched word in a manner substantially visually isolated from other text in the written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word.
The invention further involves a method for improving the accuracy of a speech recognition program operating on a computer comprising: (a) automatically converting a prerecorded audio file into a written text; (b) parsing the written text into segments; (c) correcting each and every segment of the written text; (d) saving the corrected segment in an individually retrievable manner; (e) saving speech files associated with a substantially corrected written text and used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; (f) establishing an independent instance of the written text from the pre-recorded audio file using the speech recognition program; (g) replacing each segment in the independent instance of the written text with the individually retrievable saved corrected segment associated therewith; (h) saving speech files associated with the independent instance of the written text used by the speech recognition program towards improving accuracy in speech-to-text conversion by the speech recognition program; and (i) repeating steps (f) through (i) a predetermined number of times.