Modern speech recognition systems typically include an acoustic model and a language model. The acoustic model is used to generate hypotheses regarding which subword units, e.g. phonemes, correspond to an utterance based on the acoustic features of the utterance. The language model is used to determine which of the hypotheses generated using the acoustic model is the most likely transcription of the utterance based on lexical features of the language in which the utterance is spoken. The acoustic model and language model are typically configured using training data, including transcriptions known to be correct. Discriminatively training the acoustic and language models configures the models so that known correct results are more easily distinguished from results known to be incorrect.