The present invention relates generally to a system and method for producing an optimal language model for performing speech recognition.
Today's speech recognition technology enables a computer to transcribe spoken words into computer recognized text equivalents. Speech recognition is the process of converting an acoustic signal, captured by a transducive element, such as a microphone or a telephone, to a set of words. These words can be used for numerous applications including data entry and word processing. The development of speech recognition technology is primarily focused on accurate speech recognition, which is a formidable task due to the wide variety of pronunciations, individual accents, and speech characteristics of individual speakers. Speech recognition is also complicated by the highly technical and scientific vocabulary used in certain applications for speech recognition technology, such as in the medical profession.
The key to speech recognition technology is the language model. A language model describes the type of text the dictator will speak about. For example, speech recognition technology designed for the medical profession will utilize different language models for different specialties in medicine. In this example, a language model is created by collecting text from doctors in each specialty area, such as radiology, oncology, etc. The type of text collected would include language and words associated with that practice, such as diagnoses and prescriptions.
Today's state of the art speech recognition tools utilize a factory (or out-of-the-box) language model and a separate customizable site-specific language model. A recognition server determines if the site language model requires updating by monitoring the dates the factory language model and the customized site-specific language model were created, modified, or copied. The site language model would then be updated via a process to add words. Making updates to the factory language models requires the recognition server to run through and update all of the language models before they are ready to run recognition tasks.
Separate independent processes are used to (1) update the site language model, and (2) perform the batch speech recognition task using the updated site-specific language model. If the recognition server had not updated the site language model, the previous out-of-date site language model would be used in step 2. Once the speech recognition creates a transcribed output report using the language model, the transcribed report is then run through post-processing which applies formatting to the report.
What is needed is speech recognition technology that automatically updates the site language model and ensures that the most up-to-date site language model is used for recognition.