The present invention relates to voice synthesizers and in particular to a highly simplified voice synthesizer that is capable of producing quality speech.
In general, the present invention comprises a synthesizer of the type disclosed in copending U.S. application Ser. No. 714,495, filed Aug. 16, 1976, entitled "Voice Synthesizer," and assigned to the assignee of the present application. While the synthesizer disclosed in the cited copending application comprises a highly sophisticated synthesizer capable of producing remarkably realistic sounding speech, the present invention is intended to provide a speech synthesizer that is simpler in design, smaller in size, and less expensive, yet nonetheless capable of producing quality speech.
The present speech synthesizer is adapted to be driven by an 8-bit digital input command word. Six of the bits are used for phoneme selection, thus providing 2.sup.6 or 64 possible phonemes, and the remaining two bits are dedicated to inflection control. The system is adapted to generate twelve control parameters for each phoneme. In the first embodiment disclosed herein, one of the control signal parameters, referred to as the fricative control, is utilized to control the injection of both high and low frequency fricative energy into the vocal tract. More particularly, the system utilizes the fricative control signal and the inverse of the fricative control signal to control the parallel injection of fricative energy into the second and fourth (F5) resonant filters in the vocal tract. Thus, as will subsequently be explained in greater detail, for a given phoneme having an unvoiced component, fricative energy is injected directly into the F2 and F5 resonant filters, with the amount of fricative energy that is injected into the F2 resonant filter being inversely related to the amount injected into the F5 resonant filter. Also included in the first embodiment is a second fricative excitation control network that is adapted to control the injection of fricative energy in parallel into the second and third resonant filters in the vocal tract under the control of the vocal amplitude control signal. Consequently, the combination of the glottal waveform which is injected into the F1 resonant filter and the vocal amplitude controlled fricative injection into the F2 and F3 resonant filters, provides asynchronous excitation of the serial vocal tract. The result of using white noise as the primary source of excitation of the F2 and F3 resonant filters provides the synthesizer with a more "breathy" sounding voice.
A second embodiment disclosed herein is adapted to operate off a 12 volt power source and thus is particularly suited for use with a portable power supply. The system is also driven by 8-bit digital command words and is adapted to generate twelve electronic control signal parameters per phoneme. One of the control parameters, however is utilized to produce two separate control signals, thus providing an additional control signal without a lot of additional circuitry.
A unique pause control circuit is included in the second embodiment that is adapted to detect the existence of a pause phoneme, and then maintain the values of certain critical parameters beyond the termination of the phoneme preceding the pause to prevent the characteristics of the vocal tract from changing due to transitional changes in the control signal parameters before the audio output has completely faded out. Briefly, the pause control circuit functions by producing an output signal whenever the circuit detects a lack of both the vocal amplitude and fricative amplitude control signals. The output signal produced is then utilized to sample and hold the outputs of a tri-state latch which maintains the current values of the affected parameters. The same output signal is also used to simultaneously disable a pair of analog gates to prevent transitional changes of two additional control signal parameters. The output signal is automatically terminated after a predetermined period of time into the pause phoneme less than the entire duration of the pause phoneme.
The serialized vocal tract in the second embodiment is also asynchronously driven as in the first embodiment, however vocal energy is used for the second excitation signal instead of white noise. More particularly, the glottal waveform that is injected into the first resonant filter is also injected in parallel into the second resonant filter. Thus, due to the inherent delay introduced by the F1 resonant filter, the F2 and F3 resonant filters are effectively driven twice; first by the direct parallel injection of vocal energy into the second resonant filter, and secondly by the delayed excitation from the residual vocal energy from the output of the first resonant filter. The result is an improved sounding voice due to the closer simulation of the true action of the human glottis which actually excites the vocal chords twice during each open and close cycle.
Additional objcts and advantages will become apparent from a reading of the detailed description of the preferred embodiments which makes reference to the following set of drawings in which: