Artificial speech is conventionally produced by generating a digital bit stream representing a sequence of amplitude samples which define sound waves in the time domain. This digital bit stream is conventionally converted into an analog signal by a digital-to-analog converter (DAC). When this analog signal is applied to a loudspeaker, the spoken words corresponding to the encoded text are heard.
A problem arises when such bit a stream is generated on a computer which does not have a DAC. Such computers are typically used for low-cost personal computer (PC) applications in which only single-frequency tones or games noises need to be produced. For tones such as the "bell" tone commonly used on personal computers, the central processing unit (CPU) of the computer produces a pulse train which alternately applies logic level "1" and logic level "0" voltages to the speaker at the desired tone frequency. For game sounds, a random waveform centered about zero is digitally generated and infinitely clipped (i.e. if the sign of a sample is positive, the applied logic level is "1", and if it is negative the applied logic level is "0").
If infinite clipping is performed on a waveform representing a spoken word, the sound produced by the speaker is marginally recognizable speech, but the vast amounts of spurious frequencies generated by the clipping make this process useless for applications in which speech quality is a factor.
U.S. Pat. No. 4,805,220 provides a solution to this problem. The teaching of this patent makes it possible to produce clear speech with DAC-less computers directly from a digital bit stream by switching the computer's speaker between logic levels "1" and "0" at an ultrasonic carrier rate and varying the "1"/"0" duty cycle at audio frequencies according to the speech signal to be produced. Carrier clicks or transients are prevented by generating the ultrasonic carrier continuously through silent periods as well as during phonemes.
In the invention of U.S. Pat. No. 4,805,220, speech generation in real time is made possible by interleaving the speech-generating CPU operations with the logic "1" and logic "0" operations of the CPU. This method is quite generally usable, but it is cumbersome and cannot take advantage of the enhanced capabilities of current computer technology.