Automated speech recognition is commonly used in call centers to convert voice signals from callers into text. Generally, a call or voice recording is received into the call center and speech is obtained. The speech is input into an automated speech recognition system, which parses the speech into short segments and assigns phonemes to the segments. The phonemes are analyzed and compared to a grammar of known words, phrases, and sentences to provide text values for the speech.
Once converted, the text can be used to store a record of a call, to identify characteristics of the call, or as a confirmation of the call. Speech recognition is also widely used in other fields, including the legal field for court reporting and dictation, and the medical field. The benefits of automated speech recognition include a reduction in the cost of employees required to manually transcribe voice messages and an increase in transcription speed. However, a lack of transcription accuracy is a barrier to widespread use of automated speech recognition.
A conventional approach using automated speech recognition and manual transcription has been implemented as an attempt to address and improve transcription accuracy. Generally, a voice message is first transcribed via automated speech recognition. Subsequently, an accuracy threshold is applied to the transcribed voice message. If the accuracy of the transcribed message is above the threshold, the transcribed voice message is provided to a user or stored. Whereas, if the accuracy of the transcribed message is below the threshold, the entire voice message is transmitted to a human transcriber for manual transcription. During manual transcription, each voice utterance in the voice message transmitted to the human transcriber is separately processed, which can be expensive and time consuming.
In large call centers, hundreds or thousands of calls can be received within a relatively short time period. During this time period common utterances are received into the call center from different callers as voice. According to the conventional approach described above, if the transcription of the voice message fails to meet a threshold accuracy, the entire voice message is then manually transcribed, which can be costly and time consuming. Thus, the conventional approach fails to reduce error by identifying similar utterances during a specified time period, manually transcribing at least one of the utterances, and then assigning the transcribed value to the remaining similar utterances.
Therefore, there is a need for providing efficient and cost effective approaches for reducing transcription error via a hybrid of automatic transcription and manual transcription. Preferably, the approach would include a reduction in the amount of manual transcription required by identifying similar utterances, manually transcribing at least one of the similar utterances, and assigning the manually transcribed value to the remaining similar utterances.