The Internet has made it possible for people to connect and share information globally in ways previously undreamt of. Social media platforms, for example, enable people on opposite sides of the world to collaborate and share ideas in content items. A “content item,” as used herein, refers to a digital visual or audio data that includes a representation of one or more words or groups of characters from a natural language. In some implementations, content items can be obtained from social network items, such as posts, news items, events, shares, comments, etc. “Words,” as used herein, can be traditional words, i.e. characters separated by whitespace or punctuation, or can be other character groupings, such as a specified number of characters. Content items generated by Internet users that at least partially contain natural language are often quite short and frequently contain portions in different languages. These and other factors can make it difficult to identify in which languages various parts of these content items were created.
One way digital content providers attempt to address this is by utilizing machine learning engines. A “machine learning engine,” or “model” as used herein, refers to a construct that is trained to make predictions for new data items, whether or not the new data items were included in the training data. For example, training data can include items with various parameters and an assigned classification. A machine learning engine trained using this training data can generate a value corresponding to a classification, e.g. a probability, for new data items. The internal state of some models can represent a distribution. Examples of machine learning engines include: neural networks, support vector machines, decision trees, probability distributions, Parzen windows, Bayes, clustering, reinforcement learning, and others. Machine learning engines can be configured for various situations, data types, sources, and output formats and machine learning engines that predict various outcomes can be combined in various ways. These factors provide a nearly infinite variety of configurations for machine learning engines.
The techniques introduced here may be better understood by referring to the following Detailed Description in conjunction with the accompanying drawings, in which like reference numerals indicate identical or functionally similar elements.