1. Field
This disclosure relates to speech recognition systems, more particularly to speech recognition systems using multiple speech recognizers with varying performance and operational characteristics.
2. Background
Speech recognition systems typically convert speech to text for dictation applications or to commands for command and control tasks. The speech is received through an incoming audio stream, converted and returned to the application as converted or recognized speech. The applications in use by the user generating the audio stream may include dictation systems, voice interfaces for menu driven applications, etc. The system may utilize cellular or landline phone systems, Voice-over-IP networks, multimedia computer systems, etc.
Speech recognition systems use modules referred to as speech recognizers, or recognizers, to perform the actual conversion. Performance in recognizers varies, even if recognizers are targeted at the same market. For example, a recognizer used in a system targeted to the dictation market from manufacturer A will perform differently than a recognizer targeted to the same market from manufacturer B. Additionally, 2 recognizers from a single manufacturer targeted at similar markets may perform differently. This occurs because different algorithms are used to perform the recognition and different speech models are used to drive the recognizers. Speech models may differ due to differing content and speakers used to generate the model, as well as the representation of this data.
Though current applications utilize only a single recognizer, more robust speech recognition systems may utilize several different recognizers. This allows the system to have different recognizers available for different tasks and users. However, selecting the optimal speech recognizer for a given situation is problematic, as is a way to track and update performance records of the various recognizers in different situations, which allows better optimization.