In general, large-scale data processing systems process large amounts of data from various sources and/or machines using iterative batch learning algorithms. As a specific example, large-scale machine learning systems may process large amounts of training data from data streams received by the system. A data stream may include examples corresponding to specific instances of an event or action such as when a user selects a search query, or when a single video is viewed from among multiple videos presented to a user. An example may contain features (i.e., observed properties such as a user being located in the USA, a user preferring to speak English, etc.) and may also contain a label corresponding to a resulting event or action (e.g., a user selected a search result, a user did not select the search result, a user viewed a video, etc.). These examples may be used to generate statistics for each of the features. In particular, an iterative batch learning algorithm is typically used in a machine learning system, e.g., a boosting algorithm, and the algorithm may perform repeated iterations over training data to generate a model. Because a training dataset may be very large, these iterations can be expensive and it may be advantageous to develop techniques for optimizing the processing efficiency of these systems.