The present invention relates generally to a system and method for filtering sections of documents for automatic speech recognition processes.
In the process of converting dictated speech files to finished reports, transcriptionists typically transform the original dictation substantially to generate the final report. Transcriptionists may revise the original text stylistically to conform to personal and site standards, to re-order the dictated material, construct lists, invoke text macros to generate tables and other large text sections, and insert document headings. Transcriptionists may also act on specific instructions to correct speech disfluencies such as repeated or fragmentary words.
Additionally, many electronic documentation systems automatically insert document formatting and metadata such as headers, footers, page turns, macros, and default demographic information. Such automatic formatting varies widely from one automatic documentation system to another. Moreover, when using such documentation systems, transcriptionists may actually remove redundant information from the text file, which may have been added by an electronic documentation system.
It has been found that automatically formatted sections are easily and frequently overwritten. Document sections may be improperly filled in or modified, where dictated and transcribed segments may be inserted into macros and tags may be deleted. Consequently, final transcribed text reports often differ substantially from the original recorded dictation.
In many automatic speech recognitions systems however, it is desirable to maintain a close alignment between the dictated audio saved in wave files or other formats and the corresponding transcribed text reports. Such alignment of the wave files and corresponding text is critical for many tasks associated with automatic speech recognition systems. In particular, language model identification (LMID), language model adaptation (LMA), acoustic model adaptation (AMA), automatic error correction, speaker evaluation, report evaluation, and post-processing techniques have been found to benefit from improvements in the alignment of the dictated wave files and the corresponding transcribed text. These processes rely on matches between the originally dictated acoustics and text as what was intended, known as “truth.” In most automated environments, “truth” is unavailable, so finished reports are used instead to produce “pseudo-truth.”
Such mismatches or misalignments between the original recorded dictation and the final text report have been found to degrade automatic speech recognition processes as described above. It has been found for example that non-filtered non-dictated sections in finished documents negatively and significantly effect LMID, most of the LM and AM augmentation processes, and speaker classification. Bad filtering or a lack of filtering is also a serious problem for automatic rewrite techniques. In this connection, it has been found that LMID is often highly inaccurate when headers and footers are not removed from the documents. For example, radiology reports have been identified as belonging to general medicine or mental health domain. In the same example, with headers and footers filtered out these reports have been recognized properly as radiology domain reports. Overall, it has been found that in some cases there is about 5% absolute accuracy degradation due to the erroneous behavior of LMID.
Unfortunately, the current state of technology does not provide a suitable solution for theses issues. For example, current solutions for these issues include manual rewrites of the documents using post-processing. This, of course, increases time of processing for the finished reports as well as the costs associated therewith.
Another sensitive process is LMA, especially for narrow domains like radiology. LMA is a process that includes adjusting word N-gram counts of the existing LM. The goal of LMA is to make the existing language model reflect better the specific speaking style of the particular user or group of users. Traditionally LMA is performed on text of finished reports which is considered to be the best available approximation of the way users dictate. It has been found that leaving the most likely non-dictated sections of reports in the text submitted for LMA leads to the opposite effect, when the LM counts are skewed and end up being further apart from the targeted individual or group specific dictation style.
As a result, finished reports usually differ substantially from the original dictation and techniques to bring them into closer alignment with the original dictation are needed.
Therefore, there exists a need for an automatic document section filtering technique. It is desirable that such a technique is both accurate and automatic. It is also desirable to have such a technique that does not require intervention by transcriptionists or other staff since this is not only time-consuming and expensive, but frequently performed inaccurately and inconsistently.
There also exists a need for a simple and reliable system and method of automatic document section filtering to identify the most likely non-dictated sections of the medical reports in order to filter them out.
There further exists a need to determine reliable heuristics in order to identify non-dictated sections based on alignment of finished reports against recognition output.
There also exists a need for a system and method of automatic document section filtering to filter sufficient amount of data to train classifiers independent of recognition output capable of identifying non-dictated sections such headers, footers, page turns, and macros based on solely text.
There also exists a need for a system and method of automatic document section filtering that uses trained models to classify document sections for documents that are available only in the text form where recognition output is unavailable.