Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
A goal of automatic speech recognition (ASR) technology is to map a particular utterance, or speech sample, to an accurate textual representation, or other symbolic representation, of that utterance. For instance, ASR performed on the utterance “my dog has fleas” would ideally be mapped to the text string “my dog has fleas,” rather than the nonsensical text string “my dog has freeze,” or the reasonably sensible but inaccurate text string “my bog has trees.”
A goal of speech synthesis technology is to convert written language into speech that can be output in an audio format, for example directly or stored as an audio file suitable for audio output. The written language could take the form of text, or symbolic linguistic representations. The speech may be generated as a waveform by a speech synthesizer, which produces artificial human speech. Natural sounding human speech may also be a goal of a speech synthesis system.
Various technologies, including computers, network servers, telephones, and personal digital assistants (PDAs), can be employed to implement an ASR system and/or a speech synthesis system, or one or more components of such systems. Communication networks may in turn provide communication paths and links between some or all of such devices, supporting speech synthesis system capabilities and services that may utilize ASR and/or speech synthesis system capabilities.