This specification relates to neural machine translation (NMT) systems. A neural machine translation system is one that includes any neural network that maps a source natural language sentence in one natural language to a target sentence in a different natural language.
Neural networks are machine learning models that employ one or more model layers to predict an output for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.
A number of attempts to develop purely neural translation models have been made. NMT systems are simple to train with backpropagation and their decoder is easy to implement.
A major limitation of current NMT systems is their reliance on a fixed and modest-size vocabulary. As a result, current NMT systems are incapable of translating rare words and can only use a single symbol to represent all out of vocabulary words. Empirically, it has been observed that sentences with many rare words tend to be translated poorly by NMT systems.