The traditional method for transcribing voice dictation does not utilize speech recognition processing to facilitate the transcription process. When traditional transcription methods are used without a template, the transcriptionist opens a blank document and starts listening to the spoken input, typing the spoken words and punctuation and adding any missing punctuation as the transcriptionist proceeds. Either from memory or by reference to a sample document, the transcriptionist manually applies formatting wherever needed and reorders the recognition results, adding and/or styling the desired section headings, to produce a finished document. Things that are typically done as part of this process are (1) typing spoken words and punctuation, (2) adding missing punctuation, (3) applying formatting, (4) adding and styling section headings, and (5) ensuring proper ordering of sections.
With the use of document templates, the traditional method for transcription becomes one in which the transcriptionist loads a template into a word processor and listens to the spoken input, typing the spoken words and punctuation and adding any missing punctuation as the transcriptionist plays back the recorded speech information. As the speaker moves from section to section of the document, the transcriptionist moves within the template, ensuring that the sections of the document appear in the desired order even if the speaker dictates the sections in a different order. The template can contain default formatting for each part of the document such that when the cursor is placed in a given location, the desired formatting for that part of the document is automatically applied. This process utilizes a speaker's spoken input to generate a finished document. The main task performed during this process is the typing of the words as spoken and the addition of punctuation, which is almost always omitted or partially omitted by the speaker. In addition to the typing and punctuation tasks, the process includes the addition of formatting and text by the transcriptionist through the use of a basis document or template. Lastly, the process includes the reordering of the document's sections into a desired order. Thus, things that are typically done as part of the traditional transcription process are (1) typing spoken words and punctuation, (2) adding missing punctuation and (3) ensuring proper ordering of sections.
More recent approaches to transcription have taken advantage of speech recognition. In recent years, speech recognition software has progressed to the extent that it can be loaded on a desktop computer system and used to directly input dictated text into an electronically displayed document. As such, speech recognition can be used in a variety of approaches to improve the efficiency of business practices. One approach is for the speaker to use speech recognition software such that the speaker's speech is converted into text while the speaker is talking. This converted speech is displayed to the speaker in electronic form so that the speaker can correct and/or format the resulting text in real-time.
An alternative approach to this direct use of speech recognition and real-time correction by the speaker is for the speech information to be recorded for deferred transcription by a transcriptionist. Such deferred transcription services free the speaker or his/her staff from the task of converting the speech information into a formatted and corrected final document, and these services can utilize transcriptionists located in remote transcription centers around the world. For example, deferred transcription services headquartered within the United States have utilized transcription centers located in remote geographic locations, such as India, where labor is reasonably skilled yet lower cost than labor within the United States. Current approaches to the use of speech recognition to facilitate deferred transcription services, however, have involved the delivery of the entire text-only results of the speech recognition process, such that a transcriptionist sees the entire text-only result file at one time.
In operation, when text-only speech recognition results are used without a template, the transcriptionist opens a document containing the text and starts listening to the spoken input, following along in the text with his/her eyes. When the transcriptionist identifies a recognition error, the transcriptionist stops the playback and corrects the recognition results. The transcriptionist stops the playback periodically to add missing punctuation to the previously played sentence or sentences. Either from memory or by reference to a sample document, the transcriptionist manually applies formatting wherever needed and reorders the recognition results, adding and/or styling the desired section headings, to produce a finished document. Things that are typically done as part of this process are (1) correcting recognition errors, (2) adding missing punctuation, (3) applying formatting, (4) adding and styling section headings, and (5) ensuring proper ordering of sections.
When text results from speech recognition are used with a template, the transcriptionist either opens two documents, one containing the text results and another containing the template, or opens one document containing both the speech recognition results and the template such that the template follows the results or vice versa. The transcriptionist can then start listening to the spoken output, following along in the text results with his/her eyes. When the transcriptionist identifies a recognition error, he/she can stop the playback and correct the recognition results. In addition, the transcriptionist can stop the playback periodically to add punctuation to the previously played sentence or sentences. Either from memory or by reference to a sample document, the transcriptionist can also manually apply formatting wherever needed. Either before, concurrent with, or after the rest of this process, therefore, the transcriptionist must arrange the recognition results into the correct parts of the template. Things that are typically done as part of this process are (1) correcting recognition errors, (2) adding missing punctuation, (3) applying formatting, and (4) ensuring proper ordering of sections.
One significant problem with the above method of applying speech recognition results to facilitate deferred transcription services by delivering the entire text-only results at once is the fact that if the transcriptionist's attention wanders even for a moment, the transcriptionist can lose his/her place in the recognition results, requiring the transcriptionist to rewind the audio and find his/her place in the document. One common approach to solving this problem is to highlight each word within the entire text of the text-only results file as the corresponding part of the audio is played. This highlighting approach, however, still suffers from inefficiencies and can be particularly difficult to utilize in a document template implementation. These difficulties are particularly evident where document templates are utilized because the transcriptionist must take the recognition results that are delivered into a document and move them into appropriate template fields.