The present disclosure relates to improved selection and presentation of content items, and more particularly, to techniques for training a neural net to select content items for presentation by incentivizing selection of diverse explorative content and dis-incentivizing selection of content that is already likely to be requested.