1. Field of the Invention
The present invention relates generally to speech recognition and, more particularly, to systems and methods for improving speech recognition results by detecting features in an audio signal and using these features to improve the accuracy of word recognition by a speech recognition system.
2. Description of Related Art
Speech has not traditionally been valued as an archival information source. As effective as the spoken word is for communicating, archiving spoken segments in a useful and easily retrievable manner has long been a difficult proposition. Although the act of recording audio is not difficult, automatically transcribing and indexing speech in an intelligent and useful manner can be difficult.
Speech is typically received into a speech recognition system as a continuous stream of words. In order to effectively use the speech in information management systems (e.g., information retrieval, natural language processing, real-time alerting), the speech recognition system initially transcribes the speech to generate a textual document.
A problem with conventional speech recognition systems is that the accuracy of the word recognition results is greatly affected by the variability in the audio signal. For example, channel bandwidth, environment (e.g., background noise), speaker, speaking style, and language can all change unpredictably. Creating a single model to detect all of these variables would be extremely difficult.
Typically, conventional systems create separate models to detect channel bandwidth (i.e., whether the channel is a narrowband or wideband channel) and gender (i.e., whether the speaker is a male or female). These systems exploit these audio features to improve the accuracy of the word recognition results.
In speech recognition, accuracy is paramount. Accordingly, it is desirable to even further improve the accuracy of a speech recognition system.