1. Field of the Invention
The present invention relates to apparatus for synthesizing human speech by the generation and combination of representations of speech components.
2. Description of the Prior Art
It has previously been proposed to synthesize human speech by the generation of sounds and the combination of a plurality of such generated sounds to represent basic speech parts. Some thought has also been given, in the prior art, to the stringing together of a number of such basic parts to simulate words or phrases. The basic sound parts have been referred to as phonemes and it has been found possible to analyse the phonemes required for intelligible speech and to specify the requirements of such phonemes in terms of sound characteristics that each requires for its reproduction.
Thus, for example, two major kinds of sound have been identified; namely, voiced sounds which are primarily the result of vibration of the vocal chords resonating in the cavities that are formed, for example, by the tongue acting in the mouth, and unvoiced sounds which are typically the sibilants and which tend to be basically derived from a random sound source such as white noise. In the case of the voiced sounds it has also been found that although in analysing the waveform of such sounds, several components of different frequencies can be identified, nevertheless a combination of only three waves of different respective frequencies is sufficient to produce a waveform that produces a recognisable sound. Thus, in typical apparatus as previously proposed, three sine-wave generators of differing frequencies have been used to provide the three basic waveforms and these have been referred to as the three formants of the sound. The formant waveforms are damped and combined to produce a resultant waveform, the relative amplitudes of the individual formant waveforms being varied to modify or give recognisable character to the resultant sound.
In such prior apparatus unvoiced sound has been derived from a white noise generator, the sound from which has been filtered and added to the combination of the basic formants. Finally, the combination has been filtered and subjected to attenuation according to specifiable laws to produce the final signal for application to a sound-reproducing transducer such as a loudspeaker. It will be seen therefore that essentially in such prior proposals the sound components are generated continuously and the controls imposed on the resultant sound elements are, in principle, all related to proportioning the amplitudes of the components required, such proportioning also involving, where appropriate, the inhibition of one or more elements, and of applying some form of attenuation or damping after the combination has been effected.
Because of the essentially continuous and analogue nature of these previously-known methods of speech synthesis it will be appreciated that there are difficulties in multiplexing synthesized speech over a plurality of channels each requiring different expressions. Thus, for example, in a typical arrangement one channel would be required to wait for the completion of a "spoken" phrase on another channel before it could acquire the use of the synthesizer for its own phrase.