1. Field of the Invention
The present invention relates generally to speech recognition technology and more specifically to identifying and classifying acoustic background environments based on meta-data and/or ambient noise and selecting an acoustic model from a group of predefined models for improved speech recognition.
2. Introduction
Currently, many automated speech recognition systems are in use. Such systems involve users calling from landline, mobile phones, computers via VoIP, or other communications devices and interacting with automated systems such as natural language spoken dialog systems. Background noise can increasingly interfere with speech recognition when calls are placed in automobiles, a subway, in an office, at a sporting event, or other noisy environments. Automated speech recognition systems mostly use Cepstral Mean Normalization in an attempt to minimize the effect of channel distortion, yet the systems remain highly sensitive to background noise, especially if the noise is dynamic or non-stationary. Two of these systems' shortcomings are that they rely on a large amount of speech before demonstrating performance improvements, and they tend to work well only in supervised mode where transcription is provided, which never happens in interaction with actual callers. These shortcomings make the automated speech recognition systems inaccurate which can frustrate callers to the point of hanging up.
Furthermore, many automated speech recognition systems seek to filter background noise prior to automatic speech recognition. Accordingly, what is needed in the art is a system for improving speech recognition in varying environments with varying types of background noise.