This specification relates to generating phoneme representations of acoustic sequences.
Acoustic modeling systems receive an acoustic sequence and generate a phoneme representation of the acoustic sequence. The acoustic sequence for a given utterance includes, for each of a set of time steps, an acoustic feature representation that characterizes the audio input at the corresponding time step. The phoneme representation is a sequence of phonemes or phoneme subdivisions that the acoustic modeling system has classified as representing the received acoustic sequence. An acoustic modeling system can be used in, for example, a speech recognition system, e.g., in conjunction with a pronunciation modeling system and a language modeling system.