This invention relates to speech recognition apparatus and, more particularly, to an apparatus which receives spoken input training words and a subsequent spoken input command word and generates a correlation function that is indicative of the resemblance of the command word to each training word.
There have been previously developed various equipments that attempt to recognize limited vacabularies of spoken words by analysis of acoustic events. Typically, such equipments are envisioned as being useful in "voice command" applications wherein, upon recognizing particular words, the equipment produces electrical signals which control the operation of a companion system. For example, a voice command could be used to control a conveyor belt to move in a specified manner or may control a computer to perform specified calculations.
Previous efforts to develop automatic methods of speech recognition have had limited success and have led to the realization of the exceedingly complex nature of speech communication. Normal speech has a high information content with considerable variability from speaker to speaker and some variability even in the same word when spoken by the same individual. Therefore, a "perfect" recognition scheme is unattainable since the nature of the speech signal to be recognized cannot be precisely defined. As a result, the preferred past schemes have been empirical approaches which have yielded at least a reasonable level of confidence, from a statistical standpoint, that a particular spoken word corresponded to a selected one of a limited machine vocabulary. The desirability of such schemes are thus not determinable by theoretical examination, but rather by a straightforward measure of recognition accuracy over an extended period of operation.
For various reasons, most prior art systems have been found unsuitable for practical applications. One of the prime reasons has been the sheer complexity of equipments that attempted to make an overly rigorous analysis of received speech signals. In addition to the expense and appurtenant unreliability, such systems have a tendency to establish highly complicated and restrictive recognition criteria that may reject normal variations of the system vocabulary words. Conversely, some equipments suffer from establishing recognition that are too easily met and result in the improper acceptance of extraneous words not included in the preselected vocabulary of the equipment.
In view of the above, it is the object of this invention to provide a speech recognition technique which yields heretofore unattained recognition accuracies without undue complexity.