This specification relates to speech recognition using neural networks.
Speech recognition systems receive an acoustic sequence and generate a transcription of an utterance represented by the acoustic sequence. Some speech recognition systems include a pronunciation system, an acoustic modeling system and a language model. The acoustic modeling system generates a phoneme representation of the acoustic sequence, the pronunciation system generates a grapheme representation of the acoustic sequence from the phoneme representation, and the language model generates the transcription of the utterance that is represented by the acoustic sequence from the grapheme representation.