Speech recognition systems have become important tools in certain work environments. In particular, environments with considerable amounts of dictation and transcription, as in the medical and legal professions, benefit from the speed and cost of speech recognition software. However, speech recognition outputs are often unformatted and unusable without some form of post processing. This is due in part to the fact that certain text normalizations that are part of the speech recognition process result in output that does not have the appearance of regular text as one expects to find in a document.
Speech recognition output may also be problematic because of the way in which people speak. Dictated speech is not always organized in the final desired order, and the manner in which people express themselves in speech does not necessarily correspond well to the manner in which they prefer to express themselves in print. Also, dictated speech may not be formatted as expected for text documents in a specific working environment, such as a hospital. In the medical field, medical reports are often generated through dictation, and speech recognition output from the dictation must be post-processed, then edited by a transcriptionist. The post-processing and editing are required to convert the speech recognition output into a properly formatted medical record for a specific site.
Previous tools attempted to post process speech recognition outputs using systems that were exclusively or primarily grammar-based. These grammar-based systems are powerful tools capable of representing very complex constructions in an elegant framework.
However, the grammar-based systems have weaknesses. In particular, they may be unwieldy and difficult to modify and maintain, and unexpected rule interactions may produce incorrect output. Furthermore, customization of a grammar-based system is a complex process which requires significant expertise and may therefore be both time- and cost-intensive.