Many digital content producers such as online stores, social media providers, search engine providers, etc., try to predict user actions and characteristics. Digital content can include any type of content that can be presented by a computing system, such as images, advertisements, video, applications, text, projections, etc. As examples, of prediction, an online store can try to predict which products those users will purchase; a social media provider can try to predict which advertisements users that visit their website will click on; or a search engine provider can try to predict the interests of a user who is performing a search.
One way digital content producers can attempt this analysis is by utilizing machine learning engines. A “machine learning engine,” as used herein, refers to a construct that is trained using training data to make predictions for new data items, whether or not the new data items were included in the training data. For example, training data can include items with various parameters and an assigned classification. A new data item can have parameters that a machine learning engine can use to assign a classification to the new data item. Examples of machine learning engines include: neural networks, support vector machines, decision trees, Parzen windows, Bayes, clustering, reinforcement learning, and others. Machine learning engines can be configured for various situations, data types, sources, and output formats. These factors provide a nearly infinite variety of machine learning engine configurations.
The amount of data available for training machine learning engines is growing at an exponential rate. It is common for web providers, for example, to operate databases with petabytes of data, while leading content providers are already looking toward technology to handle exabyte implementations. One popular social media website, for example, has over a billion active users that spend a total of over ten million hours each month interacting with the website. These users can produce hundreds of millions of interactions with other users and content items (e.g. messages, friend requests, content likes, link selections, etc.) and content posts each day. In addition, each user can be associated with a user profile and other inferred characteristics about users such as writing style, interests, skills, etc.
The versatility of machine learning engines combined with the amount of data available can make it difficult for digital content producers to select types of data, sources of data, or training parameters that effectively predict user actions and characteristics.
The techniques introduced here may be better understood by referring to the following Detailed Description in conjunction with the accompanying drawings, in which like reference numerals indicate identical or functionally similar elements.