This invention relates generally to a speech processor and is more particularly concerned with a computer peripheral which allows the computer to digitally store audio signals and to reproduce the audio signal at a later time.
Information exchange between human beings often takes place in the form of audio communication, i.e., listening and talking. This form of communication is convenient and provides a rapid means of information transfer. Audio communication can also take place between humans and computers. Computer speech output can act as a low-cost indicating device replacing gauges, warning lights, and printers in many applications. Computer speech recognition can act as a low-cost input device, replacing keyboards. Computer speech input/output has an advantage over other forms of man-to-machine communication in that it occupies minimum physical volume. Hence, speech can be used where large keyboards and displays are unacceptable. Computer speech communication is also useful for "hands-off" communication of data, required in airline baggage sorting and wheelchair controls for the handicapped. A low-cost speech system can be used in games, toys, automobiles, consumer appliances, and many other cost-sensitive applications.
Present techniques of speech processing fall into two catagories. The first, called "Linear Predictive Coding" (LPC) essentially uses an electronic model of the human vocal tract to synthesize speech. Although recent developments in the LPC area promise to reduce the cost of speech production, speech recognition using this technique is presently (and will remain) quite costly. The LPC technique would have to be reduced in cost by several orders of magnitude before it could be useful as a speech input/output device in consumer products.
The second catagory of speech processor uses the "Time-Domain" technique. In this method, a speech waveform is generated by a human and this waveform is then sampled and stored as a series of numbers. The speech is reconstructed when these stored numbers are fed through an appropriate digital-to-audio conversion system. At present, a popular technique for accomplishing this is called "Continuously Variable Slope Delta" (CVSD) modulation. CVSD changes an audio signal into a serial binary data stream, but the value of each bit in the data stream (0 or 1) depends upon the value of the bits surrounding it in the data stream. Hence the CVSD data is in a highly encoded form, and cannot be directly used for automatic word recognition purposes by a computer. The invention of this application is a Time-Domain technique which differs significantly from CVSD.