Various schemes have been developed to improve the performance of speech recognition systems. Many factors interact to degrade the performance of speech recognition systems in mobile vehicles. Factors such as ambient noise conditions, cabin design variables, speaker gender, and speaker dialect interact to influence the acoustic signal received by the speech recognition signal. These factors cause decoding errors and false alarms thereby increasing user frustration with the system.
Current speech recognition systems use a single template for each nametag. The template is used to match the nametag utterance received from the user with the proper nametag in the system. The template is created during system setup by receiving multiple utterances for each nametag and storing the correctly identified nametag in the template. The template will be based on the speaker, vehicle and environmental conditions that exist when it is created. User frustration occurs when multiple unsuccessful attempts to match the utterance with the nametag occur. These systems cannot adapt to new speakers or speaking scenarios without retraining of the system.
A user can train the system under a variety of conditions and store a different template for each scenario. This requires additional user involvement and increases training time required by the system. Furthermore, the templates are not differentiated and probability of selecting a proper template is not increased.
Multiple speech recognition engines run in parallel can be used to increase the likelihood of selecting the proper template as described in U.S. Pat. No. 6,836,758 to Bi, et al. This method produces more accurate nametag recognition but still does not account for variations in speaker, vehicle and environment nor does it adapt to changes in the acoustic signal without retraining. In addition, increased computational power and storage capacity is required to accommodate the additional speech recognition engines.
A method for biasing paths in a Markov model is proposed in U.S. Pat. No. 4,741,036 to Bahl, et al. The individual phones that distinguish similar words are given more weight to emphasize the differences between the words. This method improves the distinction between similar words but does not account for the effects of ambient noise and speaker variables such as dialect or gender. Additionally the weighting vectors are static and determined at the time a system is trained. The weighting vectors are only updated to account for changes in speech input by retraining the system.
It is therefore desirable to provide a method and system for dynamic nametag scoring that overcomes the limitations, challenges, and obstacles described above.