Much attention has been paid in recent years to the problem that many people cannot or will not type. Enormous efforts have been expended to attempt to remedy this problem, efforts relating to the user interface (e.g. the widespread adaptation of rodent-oriented graphical user interfaces) and relating to data entry (e.g. speech recognition and handwriting recognition). Speech recognition has developed to the point that off-the-shelf audio hardware can be used with off-the-shelf speech recognition engines to supply recognized text to other applications such as word processor programs.
As anyone who has used a speech recognition engine will report, the engine doesn't do as well as the human ears, audio cortex, and brain in recognition of speech. Humans have little difficulty processing distinct words even if the words are spoken in continuous fashion (as is usually the case), whilst the commercially available recognition engines have a very difficult time unless the user is trained to speak with pauses between words. Humans have little difficulty understanding the speech of a multitude of different speakers across a wide vocabulary, and in contrast most speech recognition engines do well only if the vocabulary is greatly constrained or if the range of permitted speakers is greatly constrained.
In the face of all this, it is clear that the application using the recognized text must necessarily have some mechanism according to which the user can correct mis-recognized words. A spelling checker is of no help, of course, because the words from the engine generally are correctly spelled words. In most systems in which a speech recognition engine and word processor are used, the method of correction is simple: the user reads the text in the word processor, performs a mental review of the words, and corrects words that were incorrectly recognized. The mental steps employed by the user include checking each word for consistency of grammar and meaning with respect to its neighboring words. In addition, if the person doing the editing correction happens to be the same as the person who dictated the original recognized audio, then the user may be able to refer to the user's recollection of what was said to assist in correcting mis-recognized words. Similarly if the person doing the editing correction was present during the original dictation or has other independent knowledge of what was said, then the user may be able to refer to the user's recollection or knowledge to assist in correcting mis-recognized words.
Depending on the particular speech recognition engine used, when the dictated word is recognized by the speech engine it contains specific information about the recognized word, including context, the word spelling, the audio associated with the word, confidence levels or scores, as well as additional information about the recognized word that the engine may generate. Regrettably very little of this information is put to full use by the time the user is editing the text in a word processor or other application. There is thus a great need for improved ways of editing and correcting the text that results from speech recognition.