There are multiple solutions in the marketplace for turning audio into text. Audio may come from a variety of sources such as recorded audio in a file or streaming audio such as for real time conversations between a service representative and a serviced party.
Solutions exist to transcribe audio into text. Some solutions are desktop packaged software, service or cloud solutions, transcription services, or personal assistant applications. However, these solutions typically rely on substantial training data, significant processing resources, or analytical assumptions to accurately transcribe utterances in the audio into text. For example, a service, such as a personal assistant, may attempt to perform short phrase recognition. This allows a quick assessment of a general concept referenced in the utterance. In this instance, punctuation is not an important aspect of the transcription as compared to identification of the conceptual intent.
For long form or free form dictation implementations, the dictation of punctuation is partially based on training the system and partially based on training the speaker to present audio in a way that can be understood by the system. Such services use the training information to make a best guess at punctuating, based on sentence structure, word order, etc.
There exists a gap between the short phase systems and the long form systems. For example, when sending recognized text through a natural language processing engine, punctuation can impact the understanding of a sentence. A single misplaced comma can change the entire sentence meaning. As an example of a sentence that can change meaning based on punctuation, consider the utterance: “For sales, press one for billing, press two for an operator, press three.” Natural language processing on this sentence, as punctuated, makes it look like one is for billing and two is for an operator. The properly punctuated sentence should be: “For sales press one, for billing press two, for an operator press three.” As shown by this example, a totally different meaning is achieved depending on the punctuation.