Some computing devices (e.g., mobile phones) are configured with speech-to-text functionality for converting spoken language into written form. For example, a computing device can be configured with speech-to-text functionality that can receive audio input (e.g. a user's voice) and determine text content (e.g., SMS message or email) based on the audio input.
Typically, a computing device receives audio input from words spoken into a microphone of the computing device. While a user is speaking, the computing device can identify spoken words in the audio input. The computing device can then output the identified words for display (e.g., on a screen of the computing device). Current technology may allow a computing device to perform speech-to-text conversion in near real-time and provide near-instant feedback to the user.
However, users may be distracted by seeing a near real-time text conversion of the spoken text displayed on a screen before they have finished speaking—particularly when the computing device has made transcription errors. For instance, if the computing device does not properly recognize and displays an incorrect word, the user may pause speaking in the middle of a sentence in an attempt to backtrack and/or otherwise correct the error. Such speech pauses may cause more transcription errors as the computing device may rely on natural speech patterns to perform the speech-to-text conversion and on word context to automatically perform transcription error correction.