1. Field of the Invention
The present invention relates generally to waveform analysis apparatus and more specifically to a method and system capable of learning a vocabulary of spoken words and subsequently recognizing these words when they are spoken.
2. Description of the Prior Art
Research in speech recognition techniques has been underway for twenty years under both private and government auspices. Yet experts agree that little progress has been made beyond the techniques used in the early 1950's, when the first successful recognition system was demonstrated at the Bell Telephone Laboratories. The following is representative of the state of the art.
A method for speech identification, meaning either speaker identification or speech recognition, is disclosed in the U.S. Published Patent Application to Heribert J. P. Reitboeck, No. B 358,427. This system employs sophisticated frequency tracking techniques, including variable bandwidth and center frequency filters, to analyze speech formants for subsequent comparison. This method assumes that the waveform contains certain speech characteristics (pitch, formants, etc.) and requires time normalization.
A method for digital speech compression using a predictive feedback loop is disclosed in the U.S. Patent to Sandra E. Hutchins, U.S. Pat. No. 3,973,081. This patent is mentioned because part of the method employs n-tuples, although both the use of n-tuples and the purpose of the system are unrelated to the present invention.
A method using parallel bandpass filter to analyze speech waveforms is disclosed in the U.S. patent to Fausto Pozo, U.S. Pat. No. 3,737,580. In this technique the spectrum analysis results are simply added for the purpose of speaker identification or authentication.
Another system, designed primarily for programmers who are severely disabled, is capable of storing approximately fifty words which can be recognized by the system. After being "trained," a voice input of any of the words is input to the system through a microphone and the signal is passed through a spectrum analyzer that consists of a number of bandpass filters covering the audio spectrum from roughly 200 Hz to 5,000 Hz. The screened output from these filters is then fed through a multiplexer to an analog-to-digital converter so that the energy values are converted to an 8-bit code which is used to indicate word recognition. Aside from the fact that the machine is quite expensive, it requires a pause between each word, thus imposing a limitation on the use of system which is undesirable for certain applications.
A device that requires no pause between each word, but that is limited to a 5- or 10-second continuous stream of input signals, or to an indefinite continuous stream with a limited number of words in its vocabulary, i.e., approximately 35 words, is described in the August 1976 issue of Datamation, pp. 65-68. This type of device is likewise limited in application and is relatively complicated and expensive.