The present invention relates to speech synthesis and in particular to a phoneme-based speech synthesizer that is particularly adapted for implementation in a single encapsulated integrated circuit.
Known phoneme-based speech synthesizers have principally contained vocal tracts comprised of a plurality of resonant filters. It has heretofore generally been considered impractical to produce vocal tracts of this type in integrated circuit form for several significant reasons. First of all, tunable resonant filters of the type commonly used in vocal tracts require resistors and capacitors having relatively large values to produce resonant frequencies in the relatively low frequency range of the human voice. Large value components substantially increase the size of an integrated circuit. Secondly, vocal tract resonant filters are high precision filters which are difficult to produce in integrated circuit form within the required tolerance limits.
The present invention utilizes a novel capacitive switching technique to implement the vocal tract, as well as additional parameter controlled functions, which eliminates the above noted problems and thus makes the speech synthesizer according to the present invention particularly adapted for implementation as a single integrated circuit silicon "chip". The capacitive switching technique employed not only eliminates the requirement of large valued components in the vocal tract, but also eliminates the requirement that the values, and hence the size, of the tuning components in the vocal tract be accurately controlled. Rather, as will subsequently be seen, with the capacitive switching technique of the present invention, it is only important that the ratio of the tuning component values be accurately controlled, thus making it substantially easier to maintain the high accuracy levels required during production.
In addition, the present speech synthesizer includes a unique digital transition circuit which gradually transitions the values of certain control signal parameters between the different steady-state values assigned for different phonemes. In this manner, adjacent phonetic sounds are properly integrated to produce natural sounding speech.
The speech synthesizer of the present invention also includes a novel glottal source circuit which digitally generates the glottal pulse signal in a manner which readily permits the waveform of the glottal pulse signal to be spectrally shaped in any manner desired.
In general, the present speech synthesizer system as disclosed herein comprises a single encapsulated silicon chip which phonetically synthesizes continuous speech of unlimited vocabulary from low data input rates. The system includes a parameter storage ROM containing parameter values defining 64 different phonemes which are accessed by a 6- bit command word. Two additional input bits are provided for varying the pitch or inflection of voiced phonemes. The control parameters are generated by the storage ROM in a multiplexed fashion on an 8-bit parallel output buss. The control parameters which are used to control the vocal tract are initially provided to a novel digital transition circuit which serves to gradually transition the variations in the steady-state values of the parameters which occur from phoneme to phoneme. As will subsequently be seen, the digital transition circuit performs this function in a unique manner by continuously adding one eighth of the difference between the target parameter value and the current parameter value to the current parameter value, and using the result as the new current parameter value. In the preferred embodiment, the transition circuit is clocked at a rate which results in a parameter attaining approximately 70% of its target value within a span of 33 milliseconds.
The transitioned control signal parameters from the digital transition circuit are provided to the vocal tract to control the resonant frequencies of the F1, F2 and F3 resonant filters, and to control the injection of vocal and fricative excitation energy into the vocal tract. In addition, the "Q" or bandwidth of the F2 resonant filter is separately controllable for producing nasal phonemes as is conventional. The various parameter controlled functions are implemented by utilizing the 4-bit parallel digital parameter signals to selectively control the capacitance ratio of capacitor networks in the controlled circuits. The capacitor networks are then switched on and off at a predetermined frequency so that the controlled capacitor networks effectively simulate a controlled variable resistance element.
The glottal source generator circuit produces a glottal pulse signal having a fundamental frequency that varies in accordance with the setting of the two inflection control bits. In addition, a degree of automatic inflection control is provided by also varying the fundamental frequency of the glottal signal inversely with respect to movement in the resonant frequency of the F1 resonant filter in the vocal tract. The spectral shape of the glottal pulse is controlled by selectively presetting the analog d.c. signal levels applied to the parallel inputs of a multiplexer. The selector inputs of the multiplexer are connected to the output of a counter which is clocked at a predetermined rate. The waveform of the glottal signal produced at the serial output of the multiplexer therefore comprises a segmented approximation of an analog glottal pulse signal with the levels of the various segments determined by the preset d.c. levels