This invention relates generally to providing training data to a machine learning system.
Machine learning systems typically require vast amounts of training data when building inference models. Often such training data is distributed by a training data producer to the machine learning system as a single stream of data. A single stream distribution model, however, is bottlenecked by the speed of the training data producer. When vast amounts of data are required, the slow distribution rate of the training data in turn slows down the machine learning system.
To address the slow distribution rate of a single stream, some systems distribute training data to the machine learning system in parallel. This parallel distribution model, however, does not preserve the order of the training data distribution over multiple iterations of the machine learning system. Varying the order of training data distribution has undesirable downstream effects in machine learning systems.