This disclosure relates generally to machine learning. More particularly, it relates to teaching a machine learning system to detect transcription errors in speech recognition tasks.
Speech recognition is a computer technology that allows a user to perform a variety of interactive computer tasks as an alternative to communicating by traditional input devices such as mouse and keyboard. Some of the tasks include communicating commands for the computer to execute a selected function or to transcribe the speech into a written transcription intended for a computer application such a spreadsheet or word processing application. Unfortunately, the speech recognition process is not error free and an important problem is to correct transcription errors or “mistranscriptions”. A mistranscription occurs when the speech recognition component of a computer incorrectly transcribes an acoustic signal in a spoken utterance. In an automated speech recognition task when select words are incorrectly mistranscribed, the command may not be properly performed or the speech may not be properly transcribed. The mistranscription can be due to one or more factors. For example, it may be because the user is a non-native speaker, due to sloppy speech by the user, or because of background noise on the channel to the speech recognition system.
One type of mistranscription is a substitution error where the speech recognition system replaces the uttered word with an incorrect word. Another type of error is an insertion error where the system recognizes a “garbage” utterance, e.g., breathing, background noise, “um”, or interpreting one word as two words, and so forth. Yet another type of transcription error is a deletion error where one of the uttered words does not occur in the transcription. In some cases, a deletion could occur because the speech recognition system rejects the recognized phonemes as a non-existent word according to its dictionary. Alternatively, the deletion is due to an incorrect merge of two words. For example, the user may have said “nine trees” and the system recognized the utterances as “ninety”.
Conventional approaches for resolving mistranscriptions include manually examining the transcript for errors and correcting them either through an input device such as a keyboard, or by having the system identify candidate mistranscriptions and entering a dialog with the user intended to correct them. For example, the system could ask the user via a speaker, “Did you say ‘chicken’?” and if the user says “no”, the system will log the candidate mistranscription as an error. The number of transcription errors also can be reduced by improving the speech model for a particular user. As a greater number of speech samples are received from the particular user by the system, either by having the user read from a known transcript, or through continued use of the system by the user, the default acoustic model of the speech recognition system can be better adapted for the user.
Further improvements in computer aided speech recognition are needed.