The invention relates to a correction device for correcting a text recognized by a speech recognition device from a spoken text, wherein the recognized text contains words that are correctly recognized for spoken words of the spoken text and words that are not correctly recognized therefor.
The invention further relates to a method of correction for correcting a text recognized by a speech recognition device from a spoken text, wherein the recognized text contains words that are correctly recognized for spoken words of the spoken text and words that are not correctly recognized therefor.
A correction device of this kind and a method of correction of this kind are known from U.S. Pat. No. 5,031,113, in which a dictating device is disclosed. The known dictating device is formed by a computer, which runs speech recognition software and text processing software. A user of the known dictating device can speak a spoken text into a microphone connected to the computer. The speech recognition software, which forms a speech recognition device, performs a speech recognition process and in so doing allocates a recognized word to each spoken word of the spoken text, as a result of which a recognized text is obtained for the spoken text. Also, in the course of the speech recognition process, link information is determined that flags the word of the recognized text that was recognized for each spoken word of the spoken text.
The known dictating device also forms a correction device with which incorrectly recognized words can be replaced with correction words. For this purpose a user of the correction device can activate a synchronous reproduction mode of the correction device, in which the spoken text is reproduced acoustically and synchronously with this the words of the recognized text flagged by the link information are highlighted (i.e. marked) visually. The synchronous reproduction mode has proved in practice to be particularly advantageous for the correction of text recognized by the speech recognition device. It has further been found that many users do not check the entire recognized text with the help of the synchronous reproduction mode but only certain parts thereof. These certain parts may be, for example, parts of the text that are particularly critical and that must be absolutely free from errors, or they may be parts of the text that are particularly difficult for the speech recognition software to recognize and that are therefore likely to contain a large number of incorrectly recognized words.
It was found to be a disadvantage in the known correction device that, after correcting the recognized text with the correction device, a user has no way of determining which parts of the recognized text have been corrected with the aid of the synchronous reproduction mode and which parts have still to be corrected therewith.
It is an object of the invention to provide a correction device of the kind defined in the first paragraph above and a method of correction of the kind defined in the second paragraph above, in which the disadvantage described above is avoided.
To achieve the object indicated above, features according to invention are proposed for a correction device of this kind, such that the correction device can be characterized in the manner detailed below.
A correction device for correcting a text recognized by a speech recognition device from a spoken text, wherein an item of link information for each part of the spoken text flags the associated recognized text,
having memory means for storing at least the spoken text and the recognized text, and
having reproducing means for acoustically reproducing the spoken text and visually marking, synchronously, the associated recognized text flagged by the link information when a synchronous reproduction mode is activated in the correction device, and
having marking means to store marking information in the memory means, which marking information flags those parts of the recognized text and/or of the spoken text that were reproduced at least once by the reproduction means when the synchronous reproduction mode was activated.
To achieve the object indicated above, features according to invention are proposed for a method of correction of this kind such that the method of correction can be characterized in the manner detailed below.
A method of correction for correcting a text recognized by a speech recognition device from a spoken text, wherein an item of link information for each part of the spoken text flags the associated recognized text and wherein the following steps are performed:
storage of at least the spoken text and the recognized text;
when the synchronous reproduction mode is activated, acoustic reproduction of the spoken text and synchronous visual marking of the associated recognized text flagged by the link information;
storage of marking information, which marking information flags those parts of the recognized text and/or of the spoken text that were reproduced at least once before when the synchronous reproduction mode was activated.
The features according to the invention achieve that those parts of the recognized text and/or those parts of the spoken text that were reproduced acoustically and were visually marked at least once when the synchronous reproduction mode was activated are flagged by marking information. In this way the correction device is able, advantageously, either to mark visually the part of the recognized text that has already been corrected once with the help of the synchronous reproduction mode, or to mark acoustically the associated part of the spoken text. This enables a user of the correction device according to the invention to correct the recognized text considerably more efficiently.
The provisions of claim 2 and claim 8 offer the advantage that unwanted parts of the spoken text flagged by the speech recognition device or by the correction device as suppression information are not reproduced acoustically during the synchronous reproduction mode. During the synchronous reproduction mode the user is thus able to concentrate more satisfactorily on the essential parts of the spoken text and the associated parts of the recognized text. Also, the acoustic reproduction can be speeded up, so that advantageously a recognized text can be corrected more quickly.
The provisions of claim 3 and claim 9 offer the advantage that certain parts of the spoken text, though unwanted, are still reproduced, namely when the user listens to such parts of the spoken text for a second or further time. This is particularly advantageous because unwanted parts of the spoken text of this kind often cause incorrectly recognized words to be recognized when the speech recognition process is performed, and by listening to these unwanted parts of the text the user is more easily able to draw conclusions as to the word that ought really to have been recognized.
The provisions of claim 4 provide a list of those parts of the spoken text that it is particularly advantageous to have marked as unwanted by suppression information. Such unwanted parts of the text are thus parts of the spoken text where the user made a pause (=silence) while dictating or where he repeated a word or made a so-called hesitating sound (e.g. aah, mm . . . ) as he thought about the next sentence.
The provisions of claim 5 offer the advantage that the correction device visually marks that part of the recognized text that has already been reproduced at least once, and thus corrected, in the synchronous reproduction mode for the benefit of a user or a person who has to check the work of users of the correction device. As a result, professional transcription services can provide an effective quality control.
The provisions of claim 6 offer the advantage that, depending on whether or not the recognized text and the associated spoken text have already been reproduced once in the synchronous reproduction mode, positioning means belonging to the correction device position a text cursor N words or M words upstream of the word that is marked the moment the synchronous reproduction mode is interrupted. The numbers defined could be, for example, M=3 and N=1, as a result of which allowance would be made for the longer response time of the corrector when an incorrectly recognized word is found in the recognized text for the first time. These provisions are particularly advantageous because the text cursor is usually already positioned on the incorrectly recognized word to be corrected once the synchronous reproduction mode has been interrupted, and time taken to position the text cursor manually can thus be saved.