This invention is directed to speech recognition systems, and more particularly to a technique for implementing such a system using a relatively slow host processor having a high speed signal processing subsystem. The invention will be described in the context of a speech recognition system, although it should be appreciated that the signal processing arrangement of this invention could be applied as well to other systems, e.g. speech compression, data communication, biomedical signal processing, etc.
A typical speech recognition "philosophy" formulates the problem of recognizing continuous as well as isolated speech in the framework of communication theory. With reference to FIG. 1, the speaker 10 recites text presented by a text generator 12. In read speech, the text is a written memo, letter, document, etc. In spontaneous speech, the text is some form of higher cortical functions in the speaker's brain. The acoustic processor (AP) 14 converts the acoustic waveform into a sequence of symbols suitable for the linguistic decoder (LD) 16. The LD 16 attempts to deduce the text read by the speaker by choosing that word sequence which accounts best for the acoustic processor output symbols.
The acoustic processor 14 can be viewed as a vector quantizer which quantizes the continuous speech waveform into a finite alphabet of representative symbols. A functional block diagram of the acoustic processor 14 may be as shown in FIG. 2. The signal processor 18 receives the speech from the speaker 10 and calculates the Discrete Fourier Transform (DFT) of 20 msec long segments of the speech signal. A feature vector is formed from the power spectrum of the speech segment using the critical band method, and a pattern recognizer 20 compares the generated feature vector with vector prototypes from vector storage 22. The pattern recognizer 20 then provides at its output a label of the particular stored vector prototype which is nearest to the feature vector in some predefined metric.
The computational requirements for a speech recognition system of the type described above are in the 10 MIPs (million instructions per second) range with heavy usage of multiplication operations. In order for the vector quantizer to run in real time, a high speed signal processor must be used, e.g., the SP-16 processor disclosed by G. Ungerboeck et al, "SP-16 Signal Processor", 1984 International Conference on Acoustics, Speech and Signal Processing, incorporated herein by reference.
It is therefore an object of the present invention to provide a continuous speech recognition system which enables a lower speed computer to perform speech recognition on a real time basis.