While speech recognition has advanced significantly over the past decade, one major problem that continues to plague this technology is performance in acoustically noisy environments. Various methods for noise reduction to enhance acoustic speech signatures have been developed for speech recognition such as the use of multiple loudspeakers or the use of video inputs. Unfortunately, background noise consisting of multiple acoustic sources can confound these enhancements—the so-called cocktail room effect.
One method of increasing the signal-to-noise ratio of the intended speaker is the use of multiple microphones, that is, beam forming microphone technology. Such applications have already found their way into the marketplace with demonstrated performance improvement in speech recognition. Unfortunately these devices typically require some minimum spacing between microphones that constrains miniaturization. For instance, prior art devices are currently over 10 cm at their longest dimension.
Another approach to the enhancement of speech recognition in acoustically noisy environments is the use of non-acoustic inputs. Video enhancement of audio speech recognition algorithms—that is, the use of a camera to monitor the movement of the lip region or facial movements—has been explored by a number of leading research corporations, including Intel Corporation and Microsoft Corporation. Making use of visual and/or near-infrared cameras in speech recognition technology developed by these corporations has shown an increase in performance in very noisy environments.
The use of video inputs for speech recognition undesirably runs into the problem of poor performance in changing or poor lighting—e.g., susceptibility to low contrast environments. Moreover, the use of cameras (especially one that would be constantly running) is problematic for portable devices which require a low power solution.
Even more exotic methods of enhancing speech recognition have been developed to make use of electromyographic information—that is, the direct measurement of the motor neurons involved in speech. However, because of the exotic nature of the recording methods, such technologies are difficult to implement in widespread professional or consumer intended markets.
What is needed is a device that is relatively inexpensive, can be implemented in a small device, is low power and which permits enhanced and reliable speech recognition in acoustically noisy environments.