Field of the Invention
This invention relates generally to the storage and reproduction of human speech by electronic means, and more particularly to the storage and reproduction of human speech by means of a digital computer.
Means for electronically storing and reproducing human speech are well known. Such means generally process the speech entirely in analog form, and while storing and reproducing analog speech signals performs its intended function satisfactorily, there are many desirable operations which cannot be performed on analog speech signals. Among these desired operations are synthesis of spoken words, computer recognition of spoken words, visual display of phonetic elements of spoken words, and the like.
It has long been desired to use digital computers to perform the above operations as well as others which cannot be performed or which cannot be performed satisfactorily solely by means of analog storage and reproduction of speech. Many techniques of storing and reproducing speech in digital form have been tried. Once such technique comprises the storage in digital form of a highly accurate representation of a relatively small number of words. This technique enables a computer to reproduce the stored words in audible form with a very natural sound. However, only a small number of words can be stored in this manner, and the process of generating the necessary data for storage is relatively difficult and expensive. Accordingly, this technique finds a primary application in reproducing, under digital control, one of a selected number of words. Examples of devices embodying this technique include a machine, such as an automobile, which automatically warns its operator in audible form of a dangerous condition, and a talking toy.
Another technique for generation of speech by digital means comprises the use of a phoneme generator. Such a device can generate many words and requires much less data than does the previous technique, but a phoneme generator provides words which are of low quality and, although understandable, tend to have a flat, mechanical sound.
The above techniques, in addition to the drawbacks already discussed, can only be used as devices to provide an audible output of speech data which has been previously stored in a computer, an entirely different technique must be used to enter data indicative of speech into a computer.
A primary method of entering speech data into a computer comprises a microphone coupled to an analog to digital converter. The analog to digital converter periodically samples an analog waveform provided by the microphone and provides the samples to a computer for storage and further processing. If the samples produced by this technique are later applied by the computer to a digital to analog converter, an audible output approximating the original speech input can be obtained. This technique can provide a very high quality of reproduced speech which has characteristics similar to the characteristics of the voice of the person who originally spoke the words which were stored. However, to store very much speech in this manner requires enormous amounts of computer memory. Moreover, although this method can store and reproduce speech accurately, the stored data is highly speaker dependent. The speaker dependence of this technique severely limits its value insofar as any analysis of the speech or actual recognition of the spoken words by the computer is concerned.
Accordingly, there is a need for a way to store human speech in a computer without using unduly large amounts of storage and in a manner which enables the computer to analyze the speech and identify the spoken words independent of the particular voice characteristics of the speaker.