The transfer of verbal dictation to a concise written format is an integral part of business in many parts of society. For instance, due to the increasing amounts of audio medical records, the medical transcription industry is currently estimated to be a multibillion dollar industry. With the steady increase in size and complexity of healthcare and the desire to minimize costs associated with routine practices, there is a large push to automate routine practices, such as dictation and automatic speech recognition (ASR).
The final documents generated by transcription services differ greatly from the initial ASR output due to a number of inherent problems. Briefly, in addition to problems with the doctor's speech and common ASR problems (e.g., disfluencies, omission of function words, and wrong word guesses), there are conventions used in the final document which are generally not dictated (e.g., section headings, preamble, enumerated lists, medical terminology, and various pieces of additional structure). Traditional ASR has not focused on some of these issues, which are extremely important in fields such as medical transcription that have a specific format and high degree of specialization.