Automatic speech recognition is used at present for a broad variety of tasks. Some such tasks include: entering text in a computer (e.g., desktop dictation), performing a transaction or access a database over the telephone or a speech-enabled communication medium (IVR's), transcribing spoken data interactions for archival and search purposes (e.g. broadcast news, lectures or meetings), and transcribing human-to-human speech interactions as a communication aid (e.g. for the hearing impaired).
Conventional speech recognition technology cannot handle those tasks without error. A larger number of recognition errors may occur, for instance, when the acoustic environment of the speaker or the communication channel is noisy, or when the speech is fast, hesitant or poorly enunciated. Transcribing some types of information is also more error prone, for example, spelling names or addresses or long strings of digits.
The efficiency and the success of speech-enabled applications does not only depend on reducing the number of errors. At least as important is how these errors are handled and how easily the user can correct them. That has a large impact on the efficiency of the system, the quality of the user experience and the general acceptance of such systems.
In view of the foregoing, a need has been recognized in connection with improving upon the shortcomings and disadvantages presented by conventional arrangements.