None.
The present invention relates to systems for transcribing voice communications into text and specifically to a system facilitating real-time editing of a transcribed text stream by a human call assistant for higher accuracy.
A system for real-time transcription of remotely spoken voice signals is described in U.S. Pat. No. 5,909,482 assigned to the same assignee as the present invention and hereby incorporated by reference. This system may find use implementing both a xe2x80x9ccaptelxe2x80x9d (caption telephone) in which a user receives both voice and transcribed text through a xe2x80x9crelayxe2x80x9d from a remote second party to a conversation, and a xe2x80x9cpersonal interpreterxe2x80x9d in which a user receives, through the relay, a text transcription of words originating from the location of the user.
In either case, a human xe2x80x9ccall assistantxe2x80x9d at the relay listens to the voice signal and xe2x80x9crevoicesxe2x80x9d the words to a speech recognition computer program tuned to that call assistant""s voice. Revoicing is an operation in which the call assistant repeats, in slightly delayed fashion, the words she or he hears. The text output by the speech recognition system is then transmitted to the captel or personal interpreter. Revoicing by the call assistant overcomes a current limitation of computer speech recognition programs that they currently need to be trained to a particular speaker and thus, cannot currently handle direct translation of speech from a variety of users.
Even with revoicing and a trained call assistant, some transcription errors may occur, and therefore, the above-referenced patent also discloses an editing system in which the transcribed text is displayed on a computer screen for review by the call assistant.
The present invention provides for a number of improvements in the editing system described in the above-referenced patent to speed and simplify the editing process and thus generally improve the speed and accuracy of the transcription. Most generally, the invention allows the call assistant to select those words for editing based on their screen location, most simply by touching the word on the screen. Lines of text are preserved intact as they scroll off the screen to assist in tracking individual words and words on the screen change color to indicate their status for editing and transmission. The delay before transmission of transcribed text may be adjusted, for example, dynamically based on error rates, perceptual rules, or call assistant or user preference.
The invention may be used with voice carryover in a caption telephone application or for a personal interpreter or for a variety of transcription purposes. As described in the parent application, the transcribed voice signal may be buffered to allow the call assistant to accommodate varying transcription rates, however, the present invention also provides more sophisticated control of this buffering by the call assistant, for example adding a foot control pedal, a graphic buffer gauge and automatic buffering with invocation of the editing process. Further, the buffered voice signal may be processed for xe2x80x9csilence compressionxe2x80x9d removing periods of silence. How aggressively silence is removed may be made a function of the amount of signal buffered.
The invention further contemplates the use of keyboard or screen entry of certain standard text in conjunction with revoicing particularly for initial words of a sentence which tend to repeat.
The above aspects of the inventions are not intended to define the scope of the invention for which purpose claims are provided. Not all embodiments of the invention will include all of these features.
In the following description, reference is made to the accompanying drawings, which form a part hereof, and in which there is shown by way of illustration, a preferred embodiment of the invention. Such embodiment also does not define the scope of the invention and reference must be made therefore to the claims for this purpose.