The present invention relates to a method and apparatus for speech recognition.
Voice recognition systems are known. However, such systems, which operate on the principle of dividing the sounds into frequency bands by means of filters and then analysing the energy levels in each band, are relatively expensive. Furthermore, isolated word recognition systems based upon Time Encoded Speech (TES) which do not rely upon the principle of dividing the sounds into frequency bands, are also known.
A system and procedure for isolated word recognition using Time Encoded Speech is described in "Verification, Archetype Updating, and Automatic Token Set Selection, as a means of improving the performance of Menu Driven Isolated Word Recognition Systems using Time Encoded Speech Descriptors in High Acoustic Noise Backgrounds" by R. C. Power, R. D. Hughes and R. A. King; proceedings of International Conference Speech Input/Output Techniques and Applications (1986) pp 144-151.
TES is a form of speech waveform coding. The speech waveform is broken into time intervals (epochs) between sucessive real zeros. For each epoch of the waveform the code consists of a single digital word. This word is derived from two parameters of the epoch, its quantized time duration and its shape. The measure of duration is straightforward, and the commonly adopted strategy for shape description is to classify epochs on the basis of the number of positive minima or negative maxima occurring therein. For economical coding the number of naturally occurring distinguishable symbols produced by this process may then be mapped in a non-linear fashion onto a much smaller number (alphabet) of code descriptors. An algorithm to perform an initial TES coding is described in "Time Encoded Speech (TES) Descriptors as a Symbol Feature Set for Voice Recognition Systems, by J. Holbeche, R. D. Hughes and R. A. King, Proceedings of The International Conference Speech Input/Output Techniques and Applications (1986) pp 310-315.
Isolated word recognition systems based upon TES have many advantages over frequency division systems and are particularly advantageous in high ambient noise environments. However, such systems sometimes exhibit limitations in their ability to cope with connected or continuous word recognition tasks.