1. Field of the Invention
The present invention generally relates to continuous speech recognition systems and, more particularly, to methodologies for presenting the user with control over correction activity.
2. Background Description
In a continuous speech recognition system designed to decode speech without pauses between words, it is reasonable to expect that errors of decoding will take place and will have to be corrected by the user. The effort needed to correct errors may be affected by both the visual presentation of information supporting correction and by the actions the interface allows the user to perform on the presentation.
Speech recognition systems of the kind typified by the recognition engines developed at the International Business Machines (IBM) Thomas J. Watson Research Center are able to decode continuous utterances comprising many connected words. Unfortunately, even human listeners occasionally mistake one sequence of words for another. It is therefore a feature of such recognition engines that alternative strings of words for any given acoustic sequence are stored against the possibility that the choice presented by the recognizer is incorrect. Storing the alternative strings allows the user to be presented with alternative choices. Alternatively, the acoustic data can be stored in a certain form, and the portion corresponding to the incorrect words could be decoded again and a number of alternatives displayed for the user to choose from.
The presentation of choices from recognized alternatives is common practice in the human interfaces of discrete word recognition systems. In such systems the incorrect word can be indicated by the user, and the system user interface can, in response, present a list of alternative words. Such systems typically order the word list according to the probability associated with each word according the acoustic evidence and the language model scores.
In continuous speech recognition systems, word boundaries are not well fixed by the system since the user does not indicate word boundaries by pausing. For example, the sentences "Do you know how to recognize speech?" and "Do you know how to wreck a nice beach?" have fundamentally indistinguishable acoustics but very different content. A user faced with correcting this form of error is not given the means to address a sequence of words by the conventional discrete correction methods. Thus, the technique used in discrete error correction does not apply effectively to this problem.