An automated speech recognition (ASR) engine may use one or more acoustic models and language models to perform text transcriptions from speech data received from an audio source (e.g., a human speaker). Determining whether the ASR engine has correctly transcribed speech can be based on one or more acceptance metrics. In some technologies, multiple speech recognition engines are simultaneously employed to decode the same speech based on different language models and/or acoustic models. Since the different ASR engines may output different speech recognition results (e.g., transcribed speech), arbitration is sometimes employed to select the most accurate speech recognition result from among available results generated by different ASR engines.
In some technologies, arbitration is performed based on a ‘confidence score’ that quantifies a degree of confidence (e.g., expected accuracy) that an ASR engine has in its speech recognition results. However, confidence scores offer limited information and cannot be easily compared when provided by different ASR engines employing different language or acoustic modeling. Therefore, better techniques are desired for comparing speech recognition results from different ASR systems.