Today, speech recognition technology relies on a standard set of algorithms that are known to produce good results. When implemented on computer systems, these algorithms require a certain amount of storage and involve a relatively large number of complex calculations. Because of these requirements, real-time speech recognition systems based on these algorithms have so far not been successfully deployed in low-resource environments (i.e. low power consumption, low memory usage, low computation load and complexity, low processing delay).
An effort is ongoing to find ways to design speech recognition systems with reduced resource requirements. For example, Deligne et al. describe a continuous speech recognition system suitable for processors running at a minimum of 50 MIPS and having at least 1 Mbytes of memory (“Low-Resource Speech Recognition of 500-Word Vocabularies”, Proceedings of Eurospeech 2001, pp. 1820–1832), and Y. Gong and U. H. Kao describe a system running on a 30 MHz DSP with 64K words of memory (“Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed DSP”, Proceedings of the ICASSP 2000, pp. 3686–3689). J. Foks presents a voice command system running on a 2.5 MHz CR16B processor (“Implementation of Speech Recognition on CR16B CompactRisc”, Proceedings of the ICSPAT 2000).
Some algorithms have been developed that require fewer resources and are better adapted for low-resource environments. However, these algorithms are simpler in scope and usually designed for very specific situations. In addition, the algorithms have only allowed marginal improvements in power consumption over the algorithms described above and are still not suitable for ultra-low resource environments.
Another problem concerns speech recognition in noisy environments. In these situations, special algorithms have to be applied. The algorithms perform voice activity detection, noise reduction or speech enhancement in order to improve recognition accuracy. These algorithms also require complex calculations and therefore add a lot of overhead to the system, making it even more difficult to deploy robust speech recognition in low-resource environments.
Therefore, it is desirable to provide a speech recognition method and system to provide a high quality output, which can be deployed in low resource environments.