Users are increasingly consuming content electronically, such as by accessing digital content provided over the Internet or another such network. Users often rely upon search queries or keyword strings that can be used to identify potentially relevant content. In many instances, however, the relevance depends at least in part to the actual query that was submitted, as well as the way in which the potentially relevant content is categorized or identified. Providers are beginning to look towards machine learning and artificial intelligence for assistance in classifying content. In order to properly train a machine learning algorithm, however, there must be sufficient data available for each appropriate class or sub-class. The need to attempt to obtain and classify content for a large variety of classes, and sub-classes, can be daunting at best and in many instances can prevent machine learning from being utilized in a way that provides satisfactory results to users.