Conventionally, machine learning models are not trained on data that is representative of the distribution of data that the model will be applied to. As an example, a model may be configured to predict a video that a user is likely to watch, based on a currently viewed video. The training data used to generate the model is likely not to include features about new videos that are not part of the corpus of videos at the current time. Accordingly, the model may not perform optimally based only on training data that is not representative of the distribution of the data that the model is applied to.