As shown in FIG. 1, numeral 100, to generate synthetic speech (118) a pre-processor (110) typically converts linguistic information (106) into normalized linguistic information (114) that is suitable for input to a neural network. The neural network module (102) converts the normalized linguistic information (114), which can include parameters describing phoneme identifier, segment duration, stress, syllable boundaries, word class, and prosodic information, into neural network output parameters (116). The neural network output parameters are scaled by a post-processor (112) in order to generate a parametric representation of speech (108) which characterizes the speech waveform. The parametric representation of speech (108) is converted to synthetic speech (118) by a waveform synthesizer (104). The neural network system performs the conversion from linguistic information to a parametric representation of speech by attempting to extract salient features from a database. The database typically contains parametric representations of recorded speech and the corresponding linguistic information labels. It is desirable that the neural network be able to extract sufficient information from the database which will allow the conversion of novel phonetic representations into satisfactory speech parameters.
One problem with neural network approaches is that the size of the neural network must be fairly large in order to perform a satisfactory conversion from linguistic information to parametric representations of speech. The computation and memory requirements of the neural network may exceed the available resources. If the computation and memory requirements of the neural network based speech synthesizer are required to be reduced, the standard approach is to reduce the size of the neural network by reducing at least one of: A) the number of neurons and B) the number of connections in the neural network. Unfortunately this approach often causes a substantial degradation in the quality of the synthetic speech. Thus, the neural network based speech synthesis system performs poorly when the neural networks are scaled to meet typical computation and memory requirements.
Hence, there is a need for a method, device, and system for reducing the computation and memory requirements of a neural network based speech synthesis system without substantial degradation in the quality of the synthetic speech.