This invention relates to an apparatus and method for automatic speech recognition. Automatic speech recognition systems provide a means for man to interface with communication equipment, computers and other machines in a human's most natural and convenient mode of communication. Where required, this will enable operators of telephones, computers, etc. to call others, enter data, request information and control systems when their hands and eyes are busy, when they are in the dark, or when they are unable to be stationary at a terminal. Also, machines using normal voice input require much less user training than do systems relying on complex keyboards, switches, push buttons and other similar devices.
One known approach to automatic speed recognition of isolated words involves the following: periodically sampling a bandpass filtered (BPF) audio speech input signal to create frames of data and then preprocessing the data to convert them to processed frames of parametric values which are more suitable for speech processing; storing a plurality of templates (each template is a plurality of previously created processed frames of parametric values representing a word, which when taken together form the reference vocabulary of the automatic speech recognizer); and comparing the processed frames of speech with the templates in accordance with a predetermined algorithm, such as the dynamic programming algorithm (DPA) described in an article by F. Itakura, entitled "Minimum prediction residual principle applied to speech recognition", IEEE Trans. Acoustics, Speech and Signal Processing, Vol. ASSP-23, pp. 67-72, February 1975, to find the best time alignment path or match between a given template and the spoken word.
Isolated word recognizers such as those outlined above require the user to artificially pause between every input word or phrase. This requirement is often too restrictive in a high workload and often stressful environment. Such an environment demands the very natural mode of continuous speech input. However, problems of identifying word boundaries in continuous speech recognition, along with larger vocabulary demands and the requirement of syntax control processing to identify only predefined meaningful phrases and sentences, requires added and more complex processing.
It is desirable to combine the relative ease of implementation of an isolated word recognizer with the advantages of continuous speech recognition when required in a single, inexpensive and less complex automatic speech recognition machine.