This specification relates generally to text-to-speech synthesis and more specifically to text-to-speech synthesis using neural networks.
Neural networks can be used to perform text-to-speech synthesis. Typically, text-to-speech synthesis attempts to generate a synthesized utterance of a text that approximates the sound of human speech.