Communications including a spoken audio component such as TV broadcasts, internet TV multicasts, or voice or video calls such as VoIP calls, often include a text transcription of the speech occurring in the audio speech. This could be for the for the benefit of a receiving user who is hard of hearing, or if the communication is being consumed at the received side in an environment where it is not appropriate to have the audio turned on or turned up to a clearly audible level (e.g. a quite public place where other people are present). Alternatively or additionally, the reason for the transcription could be because the sending user may simply prefer dictation rather than typing as a means of sending textual messages.
Different techniques are known for converting speech to text as part of a one-way or two-way communication session, including techniques for doing so quickly and even in real-time. Real time means dynamically, as-and-when the audio is being sent. That is, a part of the audio stream is still being transcribed at the transmit side while a preceding part of the same stream is still being played out at the receive side. This could be because the real-time stream is live and so it would be impossible to transcribe it in advance (future events to be transcribed in the stream have not yet occurred while a current part of the audio stream is being transcribed), or simply because there is not been enough time or it is not time efficient to transcribe in advance (e.g. that requires the transcription to be prepared, stored, retrieved and then synchronized with the playout).
For instance, in the case of one-way TV broadcasts, the transcription may be performed in real-time by a skilled human stenographer using a dedicated stenographer's keyboard (stenotype machine). With only a small delay the transcribed text may then be included in the broadcast to accompany the corresponding audio from which it was transcribed.
In the case of VoIP calls, it is known to include a voice recognition algorithm at the VoIP server. When the sending user speaks so as to send an audio speech signal to the transmit side via the server, the algorithm automatically transcribes the speech and includes this in the message sent to the receive side. As another example, a user could use voice recognition software to dictate a written note and then attach the note to a non-audio communication such as an email or IM (instant messaging) chat message.