1. Technical Field
The present disclosure relates to low latency text-to-speech and more specifically to web-based low latency text-to-speech without plugins.
2. Introduction
Current approaches for incorporating text-to-speech (TTS) functionality for web browsing or other web-based applications suffer from several limitations. For a system to be responsive, or in other words to have low latency characteristics, current approaches feed the text to be synthesized to the synthesizer in small chunks, and use Adobe® Flash® Player or some other external program or web browser plug-in to render the audio. These other programs may not always be available, especially so on mobile or other low-resource devices. Thus, the TTS system is often deprived of potentially valuable information that would be present in complete sentences or paragraphs. This information, if it were available, could be used to render the audio with appropriate prosody or other features. These approaches can provide either good latency or good prosody, but provide each at the expense of the other. In other words, current approaches for web-based TTS are unable to provide both good latency and good prosody at the same time, and rely on browser plug-ins.