This specification relates to generating output sequences from input sequences using neural networks.
Many data processing tasks involve converting an ordered sequence of inputs into an ordered sequence of outputs. For example, machine translation systems translate an input sequence of words in one language into a sequence of words in another language. As another example, pronunciation systems convert an input sequence of graphemes into a target sequence of phonemes. In these tasks, the outputs in the output sequence are selected from a fixed vocabulary of possible outputs, e.g., a vocabulary of words or a vocabulary of phonemes. In some other tasks, however, the number of possible outputs in the vocabulary depends on the length of the input sequence. For example, some tasks involve sorting the inputs in an input sequence to generate an output sequence that includes the inputs from the input sequence ordered according to some specified characteristic.