Automated speech recognition is an important technique to implement human machine interfaces (HMIs) in a wide range of applications. In particular, speech recognition is useful in situations where a human user needs to focus on performing a task where using traditional input devices such as a mouse and keyboard would be inconvenient or impractical. For example, in-vehicle “infotainment” systems, home automation systems, and many uses of small electronic mobile devices such as smartphones, tablets, and wearable computers can employ speech recognition to receive speech commands and other input from a user.
Most prior art speech recognition systems use a trained speech recognition engine to convert recorded spoken inputs from a user into digital data that is suitable for processing in a computerized system. Various speech engines that are known to the art perform natural language understanding techniques to recognize the words that the user speaks and to extract semantic meaning from the words to control the operation of a computerized system.
In some situations, a single speech recognition engine is not necessarily optimal for recognizing speech from a user while the user performs different tasks. Prior art solutions attempt to combine multiple speech recognition systems to improve the accuracy of speech recognition including selecting low-level outputs from the acoustic models different speech recognition models or selecting entire sets of outputs from different speech recognition engines based on a predetermined ranking process. However, the low-level combinations of outputs from multiple speech recognition systems do not preserve high-level linguistic information. In other embodiments, multiple speech recognition engines generate full speech recognition results, but the determination process of which speech recognition result to select in the outputs of multiple speech recognition engines is also a challenging problem. Consequently, improvements to speech recognition systems that improve the accuracy of selection of speech recognition results from a set of candidate speech recognition results from multiple speech recognition engines would be beneficial.