While listening to spoken words, a human listener often indicates his quality of understanding through visual and other cues. If the speaker is human and is in the presence of the listener, the speaker often recognizes these cues and adjusts one or more aspects of her speech patterns accordingly, such as by speaking louder or softer, using better diction, speaking more slowly, emphasizing certain words, or the like. By making such adjustments, the speaker hopes to improve the listener's quality of speech recognition, thus improving his overall understanding of what the speaker is saying.
In recent years, however, email, text messaging and other technologies have become more pervasive, often replacing oral conversations. Such technologies do not provide the recipient with the sorts of cues described above. However, because, traditionally, the person sending the email, text message or the like has the ability to review the written (typed) message before transmitting it, the person is able to control the content as desired.
More recently, however, Automatic Speech Recognition (“ASR”) systems have found application with regard to text messaging, which until recently involved the input of a text message by a sender who presses letters and/or numbers associated with the sender's mobile phone. As recognized for example in the aforementioned, commonly-assigned U.S. patent application Ser. No. 11/697,074, it can be advantageous to make text messaging far easier for an end user by allowing the user to dictate his or her message rather than requiring the user to type it into his or her phone. In certain circumstances, such as when a user is driving a vehicle, typing a text message may not be possible and/or convenient, and may even be unsafe. However, text messages can be advantageous to a message receiver as compared to voicemail, as the receiver actually sees the message content in a written format rather than having to rely on an auditory signal.
Unfortunately, in at least some ASR systems, inaccurate transcriptions are commonplace. Thus, in systems in which text messages are sent directly to the recipient without review by the sender, inaccurate transcriptions can create considerable communication errors, often without the sender being aware of such problems. Moreover, even if senders have the ability to review their messages before sending them, they may choose not to do so.
Although methodologies have been developed to provide feedback to the user (transmitter), today's technologies provide feedback to the user based only on audio signal quality. That is, the cues to the user are based on information from the audio capture device. More particularly, user cues are based on measurement of the audio signal. Furthermore, even where existing methodologies provide feedback, in the form of confidence levels determined by the system, such feedback is reported only after the fact, i.e., after a complete utterance has been converted to text. The user cannot use such information to adjust the quality of his speech until he begins a subsequent utterance.
Thus, a need exists for a system that uses an ASR engine to transcribe a user's utterance for subsequent transmission to one or more recipients as a text message or the like, but which also provides the user with cues, in a timely fashion, that mimic one or more of those typically provided as feedback by a listener when in the presence of a speaker, thereby permitting the user to adjust the quality of his speech within the same utterance.