The exemplary embodiment relates to the field of domain adaptation (DA) and finds particular application in adapting a classifier, derived from one or multiple source domains, to a target domain.
Domain adaptation addresses the problem of leveraging labeled data in one or more related domains, often referred as “source” domains, when learning a classifier for labeling unseen data in a “target” domain. The domains are assumed to be related but not identical. When models learned on the source domain are applied directly in the target domain, the performance is often poor due to the domain shift. For example, document types such as invoices, emails, reports, and forms can vary in appearance from one company to another. In general, however, these sources can still bring useful information for building classifiers in the target domain, particularly when labels are not available in the target domain. For example, book or film reviews, while being quite different from review of a printing device or a web service, may contain common features which enable assessment of whether or not the customers are satisfied with the item being reviewed.
Domain adaptation methods are described, for example, in L. Duan, et al., “Domain adaptation from multiple sources via auxiliary classifiers,” ICML 2009; K. Saenko, et al., “Adapting visual category models to new domains,” ECCV 2010; X. Glorot, et al., “Domain adaptation for large-scale sentiment classification: A deep learning approach,” ICML 2011; R. Gopalan, et al., “Domain adaptation for object recognition: An unsupervised approach,” ICCV 2011; O. Beijbom, “Domain adaptations for computer vision applications,” CoRR, arXiv:1211.4860, 2012; B. Gong, et al., “Reshaping visual datasets for domain adaptation,” NIPS 2013; M. Baktashmotlagh, et al., “Unsupervised domain adaptation by domain invariant projection, ICCV 2013; B. Fernando, et al., “Unsupervised visual domain adaptation using subspace alignment, ICCV 2013; Y. Ganin, et al., “Unsupervised domain adaptation by backpropagation,” CoRR, arXiv:1409.7495, 2014, hereinafter “Ganin 2014”; and N. Farajidavar, et al., “Adaptive transductive transfer machines,” BMVC 2014, hereinafter, “Farajidavar 2014.”
In general, domain adaptation methods seek to compensate for the mismatch between source and target domains by making use of information coming from both source and target domains during the learning process. The classifiers are learned or adapted automatically to the target domain either by exploiting labeled target examples (known as semi-supervised DA) or by assuming that the target domain data is fully unlabeled (unsupervised DA). Existing DA methods also generally assume that labeled source data is widely available. However, such an assumption rarely holds in practice, e.g., for confidentiality reasons.
There remains a need for a system and method for generating a classifier for a target domain when labeled target data is not available and there is a shortage of source data.