As the technology associated with speech synthesis advances, the problems and issues that arise to further advance the art of speech synthesis change with each generation of new technology. For example, early speech synthesis techniques were wrought with a broad range of problems and produced speech having a very poor quality. However, as the overall quality of speech improved, various specific issues became apparent. For instance, while the overall clarity of synthesized speech improved, it was universally noted that such synthesized speech still sounded very “mechanical” in nature. That is, it was recognized that the prosody of the synthesized speech remained poor.
As various techniques were developed to address the prosody issue, and the sophistication of speech synthesis techniques progressed as a whole, mechanically produced voices began to sound less and less mechanical. Unfortunately, the very sophistication that gave rise to non-mechanical sounding artificial voices also gave rise to occasional performance “glitches” that were both unpredictable and unacceptable to a human listener. For example, if an operator desires to synthesize a number of canned messages using a modem speech synthesis device, an average listener may note that, while each resultant synthesized message sounds natural overall, one or two words in each message might be badly formed and sound unnatural or incomprehensible. Accordingly, methods and systems that can selectively fix or “sculpt” the occasional mis-produced word in a stream of synthesized speech are desirable.