The exemplary embodiment relates to domain adaptation and finds particular application in connection with a system and method for domain adaptation in the case where there is no access to labeled source data.
Domain adaptation aims at learning robust features in order to transfer models from a source dataset to a different but related target domain. Most domain adaptation methods assume that source data is readily available. In practice, however, such an assumption rarely holds due to technical and legal constraints.
Domain Adaptation (DA) problems arise when there is a need to leverage labeled data in one or more related source domains, to learn a classifier for unseen data in a target domain. Such a situation occurs in numerous applications. Examples include named entity recognition across different text corpora, object recognition in images acquired in different conditions (such as different background scenes, object location, and pose or viewing angle changes), extracting opinions from reviews, and the like. See I.-H. Jhuo, et al., “Robust visual domain adaptation with low-rank reconstruction,” CVPR, pp. 2168-2175, 2012, for a survey on domain adaptation methods.
Different approaches have been proposed to address the text and visual domain adaptation (see, for example, L. Duan, et al., “Domain adaptation from multiple sources via auxiliary classifiers,” ICML 2009; K. Saenko, et al., “Adapting visual category models to new domains,” ECCV 2010; X. Glorot, et al., “Domain adaptation for large-scale sentiment classification: A deep learning approach,” ICML 2011, hereinafter, “Glorot 2011”; R. Gopalan, et al., “Domain adaptation for object recognition: An unsupervised approach,” ICCV 2011; O. Beijbom, “Domain adaptations for computer vision applications,” CoRR, arXiv:1211.4860, 2012; B. Gong, et al., “Reshaping visual datasets for domain adaptation,” NIPS 2013; M. Baktashmotlagh, et al., “Unsupervised domain adaptation by domain invariant projection, ICCV 2013, B. Fernando, et al., “Unsupervised visual domain adaptation using subspace alignment, ICCV 2013, Y. Ganin, et al., “Unsupervised domain adaptation by backpropagation,” CoRR, arXiv:1409.7495, 2014; and N. Farajidavar, et al., “Adaptive transductive transfer machines,” BMVC 2014. I.-H. Jhuo, et al., “Robust visual domain adaptation with low-rank reconstruction,” CVPR, pp. 2168-2175, 2012, provides a survey of domain adaptation methods with a focus on the learning theory and natural language processing applications.
Most of these DA methods made an assumption of largely available source domain collections. An access to both source and target data allows to measure the discrepancy between their distributions and 1) either build representations common to both target and sources, using the deep learning methods (M. Chen, et al. “Marginalized denoising autoencoders for domain adaptation,” arXiv preprint arXiv:1206.4683, 2012, hereinafter, “Chen 2012”; Z. Xu, et al., “From sBoW to dCoT marginalized encoders for text representation,” Proc. 21st ACM Int'l Conf. on Information and knowledge management (CIKM), pp. 1879-1884, 2012, hereinafter, “Xu 2012”), geodesic flow methods (B. Gong, et al., “Geodesic flow kernel for unsupervised domain adaptation,” 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2066-2073, 2012]) or 2) develop techniques for a direct reuse of source instances for a better target classification (Z. Xu, et al., “Multi-source transfer learning with multi-view adaboost,” in Tingwen Huang, et al., editors, NIPS, volume 7665 of Lecture Notes in Computer Science, pages 332-339, Springer Berlin Heidelberg, 2012).
In reality, the assumption of available source instances rarely holds. The source instances may become unavailable for technical reasons, or are disallowed to store for legal and privacy reasons. More realistic are situations where the source domain instances cannot be accessed but the source decision making procedures are available. These procedures are often presented in the form of classification services, which were trained on source data, available for a direct deployment and later reuse.