The invention relates to a system and method for reviewing a text recognized by a speech recognizer.
U.S. Pat. No. 5,031,113 describes a speech recognition system used for dictation. The system enables a user to review the recognized text in a so-called synchronous reproduction mode. After the user has finished dictating, the user can enter the synchronous reproduction mode. In this mode, the speech of the user is played-back while at the same time the word, which was recognized for the segment of speech being played-back, is highlighted on the display. To this end, the speech of the user has been stored in a memory. Moreover, during the recognition, word boundaries are detected in the speech. For each word a begin mark, indicating the beginning of the word, and an end mark, indicating the end of the word, are stored. This enables an accurate synchronous reproduction of the speech and the highlighting on the display. If the user detects that a word has been recognized wrongly (or the user wants to change/add/delete a word for another reason), the user can stop the synchronous reproduction and enter the editing and/or dictation mode. The user may enter the synchronous reproduction mode at any point in the text.
In itself the synchronous reproduction has been found to be beneficial to the user for correcting recognition mistakes. However, the modal behavior of the system reduces its effectiveness, since correction of a word requires too many actions of the user with respect to changing mode of the system.
It is an object of the invention to overcome above-mentioned drawback.
To meet the object of the invention, a speech recognition system includes: a speech recognition system comprising:
an input for receiving a speech representative signal;
a first memory for storing a representation of the received signal suitable for audible reproduction;
a speech recognizer operative to represent the received signal as a sequence of recognized words;
a second memory for storing the sequence of recognized words, where each recognized word is stored in association with a marker indicating a correspondence between the word and a segment of the received signal in which the word was recognized;
a controller operative to enable a user to review at least part of the sequence of recognized words by causing a synchronous reproduction of an audible and visible representation of the part of the sequence of recognized words, the synchronous reproduction including audibly reproducing a corresponding part of the received signal stored in the first memory and for each segment of the corresponding part of the received signal, at the moment when the segment is being audibly reproduced, indicating on a display a textual representation of a recognized word which corresponds to the segment; the correspondence being given by the markers stored in the second memory; to detect whether the user has provided an editing instruction, while the synchronous reproduction is active; and to pause the synchronous reproduction in response to having detected an editing instruction during the synchronous reproduction, and cause the editing instruction to be performed.
Once the user has completed a dictation, it is sufficient to once enter the synchronous reproduction mode. While effectively staying in the synchronous reproduction mode the user can edit the recognized text. The editing instructions of the user may be received via any suitable form of input, including the keyboard (e.g. to insert/delete/replace a word or character(s) of a word), the mouse (e.g. to change formatting of a part of the text, like changing font, style or size, or to change an edit position), or via voice (e.g. to dictate one or more words/characters to insert/delete/replace a word or character(s) of a word or in the form of a voice command e.g. to change formatting of a part of the text or to change an edit position). It is no longer required that the user issues a dedicated instruction to leave the synchronous reproduction mode to be able to edit the text.
Restart of the synchronous reproduction may be automatic, i.e., the user no longer needs to issue an explicit dedicated instruction to re-start the synchronous reproduction.
The synchronous may be automatically restarted after the user apparently has finished the editing which may be detected when the user has not provided editing input for a certain period of time. In a preferred embodiment, the time-out is user-configurable providing the user a choice between the system quickly restarting the synchronous reproduction (with the risk that the user was still considering further editing operations) and the system restarting the synchronous reproduction more slowly (allowing the user more time to edit, at the expense of an overall slower response). Preferably, the user can still overrule the automatic behavior via an explicit instruction to stop a restarted reproduction which restarted too quickly or to restart a restarting reproduction which restarted too slowly. The default time-out may be in the order of a few hundred milliseconds to a few seconds.
The reproduction may be restarted where it was paused. This will, in normal situations, allow for a smooth continuation of the reviewing.
If the user has edited one or more words, the reproduction may be restarted at the last edited word. In most situations, this position reflects the area of interest of the user, making it desired to restart the reproduction from that position.
In one embodiment, the user can simply change where in the sequence of recognized words the reproduction is active by indicating the desired position, e.g., by clicking the mouse at the desired position or via voice commands.
The the system is capable of dealing with those situations where the user wants to restart the dictation at the position currently reached in the reproduction mode. According to the invention, if the user simply starts his dictation by speaking (e.g., several seconds), the system no longer regards the voice input as being intended to edit (e.g., insert) a few words into the existing dictation, but instead exits the reproduction mode and goes into the dictation mode.
To meet the object of the invention, a method of enabling reviewing a sequence of words recognized by a speech recognizer in a speech representative input signal includes the steps of:
storing a representation of the received signal suitable for audible reproduction;
using a speech recognizer to represent the received signal as a sequence of recognized words;
storing the sequence of recognized words, where each recognized word is stored in association with a marker indicating a correspondence between the word and a segment of the received signal in which the word was recognized;
enabling a user to review at least part of the sequence of recognized words by causing a synchronous reproduction of an audible and visible representation of the part of the sequence of recognized words, the synchronous reproduction including audibly reproducing a corresponding part of the received signal stored in the first memory and for each segment of the corresponding part of the received signal, at the moment when the segment is being audibly reproduced, indicating on a display a textual representation of a recognized word which corresponds to the segment; the correspondence being given by the markers stored in the second memory;
detecting whether the user has provided an editing instruction, while the synchronous reproduction is active; and
pausing the synchronous reproduction in response to having detected an editing instruction during the synchronous reproduction, and causing the editing instruction to be performed.