In general, large-scale data processing systems process large amounts of data from various sources and/or machines. As a specific example, large-scale machine learning systems may process large amounts of training data from data streams received by the system. A data stream may include training examples corresponding to specific instances of an event or action such as when a user selects a specific search result, or when a single video is viewed from among multiple videos presented to a user. An example may contain features (i.e., observed properties such as a user being located in the USA, a user preferring to speak English, etc.) and may also contain a label which may indicate an event or action associated with the example (e.g., a user selected a specific search result, a user did not select a specific search result, a user viewed a particular video, etc.). These training examples may be used to generate statistics for each of the features. As new examples enter the system, a statistic associated with a feature may need to be updated. However, storing and updating these statistics can require infeasible amounts of storage and can reduce the processing speed of such systems.