Many data processing tasks involve converting an ordered sequence of inputs into an ordered sequence of outputs. For example, machine translation systems translate an input sequence of words in one language into a sequence of words in another language. As another example, pronunciation systems convert an input sequence of graphemes into a target sequence of phonemes.