Text-to-speech conversion systems are commercially available in which text encoded as computer-readable characters is converted, by a software program usually sold on a floppy disk, into a digital bit stream representing an audio signal. This digital bit stream is conventionally converted into an analog signal by a digital-to-analog converter (DAC). When this analog signal is applied to a loudspeaker, the spoken words corresponding to the encoded text are heard.
A problem arises when such an all-software program is to be used on a computer which does not have a DAC. Such computers are typically used for low-cost personal computer (PC) applications in which only single-frequency tones or game noises need to be produced. For tones such as the "bell" tone commonly used on personal computers, the central processing unit (CPU) of the computer produces a pulse train which turns the speaker on and off at the desired tone frequency. For game sounds, a random waveform centered about zero is digitally generated and infinitely clipped (i.e. if the sign of a sample is positive, the speaker is turned on, and if it is negative, the speaker is turned off).
If infinite clipping is performed on a waveform representing a spoken word, the sound produced by the speaker is marginally recognizable speech, but the vast amounts of spurious frequencies generated by the clipping make this process useless for applications in which speech quality is a factor.
In either case (bell tone or game sound), the CPU is tied up for the duration of the sound output and cannot perform other functions.