Since at least the year 1779, attempts have been made to duplicate speech by artificial means. The early machines utilized flexible resonators, usually shaped like the human vocal tract and reeds to simulate the vocal cords. At the 1939 World's Fair in New York, the Bell Telephone VODER (Voice Operated Demonstrator) was exhibited. This speaking machine had extremely complicated controls that could only be operated by a person with a high degree of skill who had been trained over a long period of time. The machine utilized a pitch-defining current that was sent to a vocal buzz generator above a certain level. Below that level, a hiss was substituted. Currents were provided to a bank of ten parallel audio filters used to define the strengths of the signal inside the bandpass range of that particular filter. At times, these filters had to be both turned on and off within an extremely short period of time, such as 1/20th second and rippled in arpeggios that would be difficult for even a skilled pianist to duplicate. One version of the VODER is disclosed in U.S. Pat. No. 2,121,142.
Current efforts at speech synthesis are almost unanimously directed toward electronic formation of intelligible speech from a continuous flow of digital impulses delivered by a computer, or from a stored digital representation of a person's voice. In the latter case, inverse filter techniques are used to divide the speech waveforms into signals to drive the synthesizer and reconstruct the voice waveform. However, these approaches have not been used to configure a speech-producing machine that can be continuously controlled. In many applications, the human speech is synthesized by the generation and combination of a plurality of sounds to represent basic speech parts, referred to as phonemes. The phonemes are then strung together to simulate words or phrases. By analyzing the phonemes required for intelligible speech, two major kinds of sounds were identified, namely voiced sounds which are primarily the result of vibration of the vocal cords resonating in the cavities that are formed, along the voice tract, and unvoiced sounds which are typically the sibilants and which tend to be basically derived from a random sound source such as white noise. A plurality of sine-wave generators of differing frequencies are used to provide a selected number of basic waveforms representative of the basic formants of sound. The waveforms are then combined to produce a resultant, complex waveform. One such synthesizer is disclosed in U.S. Pat. No. 4,092,495. A related approach is disclosed in U.S. Pat. No. 4,163,120 whereby stored speech waveforms representing basic functions are combined with other waveforms instantaneously produced by means of either time compression or time expansion of the stored basic functions.
A number of prior art devices utilize stored representations of operator selected words, phrases, phonemes and morphemes. An input device is usually provided which utilizes a keyboard having a plurality of individual touch sensitive locations, much in the manner of a typewriter. One such device is disclosed in U.S. Pat. No. 4,215,240.
Currently, digital speech synthesizer integrated circuits are commercially available from Texas Instruments Inc., General Instrument, National Semiconductor, A.M.I. and others. The Texas Instruments approach utilizes reflection coefficient-type data to control the characteristics of a digital filter. These devices are disclosed in a number of U.S. patents including U.S. Pat. Nos. 4,209,836, 4,304,965 and 4,328,395.
However, the recent synthesizers require either that the phrase to be spoken must either be stored in a memory or loaded into a register, thereby causing difficulty in real time conversation. Furthermore, these modern devices do not permit any individualistic input into the speech to permit inflections, feeling, and emphasis. For example, without using any fricative, plosive, or nasal consonants, a person can say "Where are you?"; but cannot say "Where are you?" or "Where are you?". Thus, although the prior art devices do permit some form of communication, they are not readily applicable in conversational communications with individualized characteristics.