Healthcare costs in the United States account for a significant share of the GNP. The affordability of healthcare is of great concern to many Americans. Technological innovations offer an important leverage to reduce healthcare costs.
Many Healthcare institutions require doctors to keep accurate and detailed records concerning diagnosis and treatment of patients. Motivation for keeping such records include government regulations (such as Medicare and Medicaid regulations), desire for the best outcome for the patient, and mitigation of liability. The records include patient notes that reflect information that a doctor or other person adds to a patient record after a given diagnosis, patient interaction, lab test or the like.
Record keeping can be a time-consuming task, and the physician's time is valuable. The time required for a physician to hand-write or type patient notes can represent a significant expense. Verbal dictation of patient notes offers significant time-savings to physicians, and is becoming increasingly prevalent in modern healthcare organizations.
Over time, a significant industry has evolved around the transcription of medical dictation. Several companies produce special-purpose voice mailbox systems for storing medical dictation. These centralized systems hold voice mailboxes for a large number of physicians, each of whom can access a voice mailbox by dialing a phone number and putting in his or her identification code. These dictation voice mailbox systems are typically purchased or shared by healthcare institutions. Prices can be over $100,000 per voice mailbox system. Even at these prices, these centralized systems save healthcare institutions vast sums of money over the cost of maintaining records in a more distributed fashion.
Using today's voice mailbox medical dictation systems, when a doctor completes an interaction with a patient, the doctor calls a dictation voice mailbox, and dictates the records of the interaction with the patient. The voice mailbox is later accessed by a medical transcriptionist who listens to the audio and transcribes the audio into a text record.
The medical transcriptionist's time is less costly for the hospital than the doctor's time, and the medical transcriptionist is typically much more familiar with the computerized record-keeping systems than the doctor is, so this system offers a significant overall cost saving to the hospital.
To reduce costs further, health care organizations have deployed speech recognition technology. Some efforts have been made to utilize speech recognition technology for the purpose of producing written documents. Such efforts have met with limited success, however, since producing a literal transcription of a dictation has not resulted in a document sufficiently close to the desired final document.
Until recently, most deployed automatic speech recognition systems were front-end or real-time systems. In these applications, the speaker interacts directly with the speech recognition device, which hypothesizes the spoken words and outputs them to the computer terminal with a short delay. The speaker may then be required to correct the output, either using voice commands or by typing.
In an application of background speech recognition to medical transcription, the automatic speech recognition (“ASR”) process is run “off line”, without real-time clinician interaction. The speaker dictates normally, and the speech recognition process is run in batch mode at a later time. Draft transcriptions produced by the ASR process may then be edited by the clinician or by a Medical Transcriptionist (“MT”) before being added to the medical record. An example of this type of ASR application is the EditScript product from eScription.
In background speech recognition, the speaker does not have access to the text as s/he dictates. As such, the speaker cannot interact with the speech recognition device in order to improve the appearance of the document. Moreover, the use of such verbal directives is counter-productive to the efficiency of the dictation process. Health care clinicians are used to simply dictating the medical information in the way that they feel comfortable and assuming that the final documented will be formatted according to generally accepted standards.
A hybrid of the front-end and background speech recognition process is also possible. In these “near real-time” applications, the speaker dictates for some period of time, before indicating to the speech-recognition device that the dictation has been completed. At this point, the speech-recognition device completes its processing on all of the audio received and outputs the entire transcription to the computer terminal for editing, either with voice or typing, by the user. In general, front-end speech recognition software is resident on the computer at which the speaker is speaking, whereas background speech-recognition runs on a high-end server, which is often remote from the dictation device. Near-real-time speech recognition may be run in either of these modes, or in a combination scenario, where some of the speech-recognition processing is done on the speaker's computer, and some is done on a remote high-end server.
Often, health care clinicians perform procedures and examinations which are similar to those they have performed previously. For example, a Urologist may perform several routine vasectomies each day, or a Radiologist may examine dozens of normal chest x-rays during a shift. In cases such as this, the medical record for the incidence of service is nearly, if not completely, identical to the document for all other such services. Accordingly, clinicians often dictate words to the effect that a certain “standard” document should be inserted as the transcription for the dictation. Sometimes, this standard document is the entire desired transcription. For example, a Urologist may say: “Please use my normal vasectomy template,” indicating that the entire standard vasectomy description should be inserted as the transcription. In other circumstances, the standard text may comprise a subset of the desired final transcription, in which case the clinician will continue with the remainder of the dictation in the usual fashion. Clinicians may dictate several such standard sub-sections within the course of a dictation and may also include standard dictation. The MT analyzes the dictation to determine whether standard text (at least for that speaker) can be inserted, and obtains and inserts the standard text as appropriate.
In these circumstances, the medical transcriptionist may have access to the text that is indicated by the dictation. In general, the MT will use the transcription device to access a list or menu of files, each file representing standard text. The appropriate file is then selected, and the standard text inserted into the transcription document. Selection and insertion of standard texts into transcription documents requires experience and, depending on how large the list of potential files, can be very time-consuming. In addition, managing standard documents is challenging for health-care institutions, particularly when MTs are dispersed geographically and their access to the transcription system is not synchronous with changes to the documents. If the MT does not have access to the most recent version of a standard document, the transcription may need to be reviewed and edited by a transcription supervisor. This workflow is especially costly.