Text to text applications include machine translation and other machine intelligence systems such as speech recognition and automated summarization. These systems often rely on training that is carried out based on information from specified databases known as corpora.
A training pipeline may include many millions of words. It is not uncommon for the training to take weeks. There is often a tradeoff between the speed of the processing and the accuracy of the obtained information.
It is desirable to speed up the training of such a system.