1. Technical Field
The subject matter disclosed herein relates to cross domain learning for data augmentation.
2. Description of the Related Art
Supervised machine learning algorithms rely on the availability of high quality training sets consisting of large numbers of examples or data having associated labels. Here, the term “labels” refers to both class labels for classification tasks and real-valued estimates for regression tasks.
For example, in learning to rank search results, a learner may be provided with relevance judgments for a set of query document pairs. In this case, the relevance judgments may indicate, for example, that the query document pairs are “very relevant,” “somewhat relevant,” or “not relevant.” In this example, these relevance judgments constitute a set of class labels for a classification task. In an example of a regression task, a label may consist of a real-valued number that constitutes an estimation of the conditional expectation of a dependent variable given fixed independent variables.
In most supervised machine learning algorithms, a learner is provided with some solved cases (examples with corresponding labels), and based on these solved cases, the learner is supposed to learn how to solve new cases, or more particularly, to learn how to accurately predict labels for new examples. However, because labels for training sets are usually provided by human experts, obtaining the training set may be quite expensive and time consuming. Furthermore, there may not be enough resources and human experts for a particular domain to create high quality, sufficiently large training data sets for that domain.
For purposes of this disclosure, the term “domain” is defined as a data distribution, p(x). Domains may include, but are not limited to, collections of information that are related by at least one common physical, political, geographic, economic, cultural, recreational, academic, or theological trait. Some non-limiting examples of domains include, for instance, the domain of published scientific journal articles, the domain of published business journal articles, the domain of web sites published in the Chinese language, or the domain of web sites having a particular country identifier or a group of country identifiers (e.g., .com, .in, .hk, .uk, .us, etc.)