During a text-to-speech conversion process, a linguistic representation of text is typically converted into a series of acoustic parameter vectors. Typically, these parameters are then converted into parameters used by a vocoder in order to generate a final speech signal.
Neural networks have been used to compute each vector of acoustic parameters, representing many computations for each second of speech. This can be a significant portion of the computational time for neural network based text-to-speech conversion.
Accordingly, there is a need for a neural network system that reduces the computation requirements for converting a linguistic representation into an acoustic representation.