Automatic speech recognition (ASR) technology typically utilizes a corpus to translate speech data into text data. A corpus is a database of speech audio files and text transcriptions in a format that can be used to form acoustic models. A speech recognition engine may use one or more acoustic models to perform text transcriptions from speech data received from an audio source (e.g., a human speaker).
Determining whether the speech recognition engine has correctly decoded received speech (e.g., utterances) can be based on one or more acceptance metrics, which can be hard-coded into application software, such as a video game, dictation software, computerized personal assistant, etc. based on existing or anticipated speech recognition engines, acoustic models, and/or other parameters. In contrast, the speech recognition engines, acoustic models, and/or other parameters are often provided and updated in the computing platform on which the application software runs (e.g., the operating system of a computer, gaming system, vehicle communications system, or mobile device). Different speech recognition engines, acoustic models, and/or other parameters provided by the platform supplier can provide different confidence classifier scores, which may or may not align with the acceptance metrics provided by the application software suppliers. Accordingly, updates to speech recognition engines, acoustic models, and/or other parameters can make an application software's acceptance metrics obsolete or inaccurate.