Various algorithms (or classifiers) are used to classify data, including linear classifiers, support vector machines (SVMs), kernel estimation, neural networks, Bayesian networks, and other classifiers. Metrics for evaluating classifiers include precision, recall (or coverage), receiver operating characteristic (ROC) curves, explainability of classification results, development costs, or maintenance costs. A performance of a classifier may depend on the characteristics of the data being classified. Furthermore, it may be difficult to identify a relationship between the data to be classified and the performance of the classifier.
On a social networking site (e.g., Facebook or LinkedIn), users may manage their own profiles. For example, users may use different forms of natural language (e.g., words and phrases) to describe things that fit into the same category. For example, users of the social networking site may be free to specify their job titles in any form they wish. In a data set of 100 million such job titles, it may not be unusual for users to specify a job title corresponding to a job category (e.g., “Software Engineer”) in 40,000 different ways.
For various reasons (e.g., in order to boost revenues earned by the social networking site through targeted advertising), owners (or administrators) of the social networking site may wish to classify their users' self-descriptive job titles (or other user-specified data) into a generated or determined set of categories (e.g., a few dozen categories). For job title classification, examples of such categories may be Engineering, Marketing, Sales, Support, Healthcare, Legal, Education, and so on.
It may be possible to use standard classifiers (e.g., SVMs) to solve such a data classification problem. However, standard classifiers may not solve the problem as effectively as another classifier, such as a classifier that is specifically targeted for the problem, especially in view of a particular set of performance metrics that is selected by the owners or administrators of the social networking site.