Healthcare costs in the United States account for a significant share of the GNP. The affordability of healthcare is of great concern to many Americans. Technological innovations offer an important leverage to reduce healthcare costs.
Many Healthcare institutions require doctors to keep accurate and detailed records concerning diagnosis and treatment of patients. Motivation for keeping such records include government regulations (such as Medicare and Medicaid regulations), desire for the best outcome for the patient, and mitigation of liability. The records include patient notes that reflect information that a doctor or other person adds to a patient record after a given diagnosis, patient interaction, lab test or the like.
Record keeping can be a time-consuming task, and the physician's time is valuable. The time required for a physician to hand-write or type patient notes can represent a significant expense. Verbal dictation of patient notes offers significant timesavings to physicians, and is becoming increasingly prevalent in modern healthcare organizations.
Over time, a significant industry has evolved around the transcription of medical dictation. Several companies produce special-purpose voice mailbox systems for storing medical dictation. These centralized systems hold voice mailboxes for a large number of physicians, each of whom can access a voice mailbox by dialing a phone number and putting in his or her identification code. These dictation voice mailbox systems are typically purchased or shared by healthcare institutions. Prices can be over $100,000 per voice mailbox system. Even at these prices, these centralized systems save healthcare institutions vast sums of money over the cost of maintaining records in a more distributed fashion.
Using today's voice mailbox medical dictation systems, when a doctor completes an interaction with a patient, the doctor calls a dictation voice mailbox, and dictates the records of the interaction with the patient. The voice mailbox is later accessed by a medical transcriptionist who listens to the audio and transcribes the audio into a text record. The playback of the audio data from the voice mailbox may be controlled by the transcriptionist through a set of foot pedals that mimic the action of the “forward”, “play”, and “rewind” buttons on a tape player. Should a transcriptionist hear an unfamiliar word, the standard practice is to stop the audio playback and look up the word in a printed dictionary.
Some medical transcriptionists may specialize in one area of medicine, or may deal primarily with a specific group of doctors. The level of familiarity with the doctors' voices and with the subject matter can increase the transcriptionist accuracy and efficiency over time.
The medical transcriptionist's time is less costly for the hospital than the doctor's time, and the medical transcriptionist is typically much more familiar with the computerized record-keeping systems than the doctor is, so this system offers a significant overall cost saving to the hospital.
To reduce costs further, health care organizations have deployed speech recognition technology, such as the AutoScript™ product (made by eScription™ of Needham, Mass.), to automatically transcribe medical dictations. Automatically transcribed medical records documents usually require editing by the transcriptionist.
In an application of background (as opposed to real-time) speech recognition to medical transcription, the automatic speech recognition process is run offline, i.e., without real-time clinician interaction. The speaker dictates and the speech recognition process is run in batch mode at another time. Draft transcriptions produced by the automatic speech recognition process may then be edited by the clinician or by a Medical Transcriptionist (MT) before being added to the medical record. An example of such an application is the EditScript™ product from eScription™.
Real-time and background speech recognition systems enroll and qualify clinicians using a prescribed quantity of training data, or in the case of most real-time ASR systems, by having the speaker dictate a specific text during an enrollment session. Once enrolled, all subsequent dictations for that speaker or speaker-worktype are processed through the speech recognition systems and transcriptions are created. For background ASR systems, these transcriptions are considered to be drafts manually edited by MTs before the final documents are uploaded into the electronic medical record. In real-time speech recognition applications, the words are transcribed as they are spoken, appearing on the computer screen for nearly immediate verification and editing by the clinician.
The yield of a background speech recognition application may be defined as the percentage of dictations which are able to be processed into draft transcriptions, as a fraction of the total number of dictations which enter the system. Generally, increasing the yield of an application reduces costs, so long as the drafts produced are of sufficient quality to save time when edited as compared to typing the transcription from scratch. Since draft transcriptions cannot be produced for a particular speaker until automatic speech recognition models exist for that speaker, there is an advantage in terms of system yield to create models with as little training as possible. However, drafts produced by ASR models based on very little training data generally lead to poorer quality draft transcriptions than would be possible by waiting for more training samples. Even with sufficient training data, some draft transcriptions require significant correction. For example, some drafts can produce more work for the MT than would typing the transcription from scratch, without the ASR device producing a draft. It may also be desirable to only allow editing of transcriptions that are substantially correct.