The following relates to the machine learning arts, classification arts, surveillance camera arts, document processing arts, and related arts.
Domain adaptation leverages labeled data in one or more related source domains to learn a classifier for unlabeled data in a target domain. One illustrative task that can benefit from domain adaptation is named entity recognition (NER) across different (possibly topic-specific) text corpora. For example, it may be desired to train a new classifier to perform NER for a newly acquired corpus of text-based documents (where “text-based” denotes the documents comprise sufficient text to make textual analysis useful). The desired classifier receives as input a feature vector representation of the document, for example including a “bag-of-words” feature vector, and the classifier output is a positive or negative label as to whether a particular named entity is referenced in the document (or, in a variant task, whether the document is directed to the named entity). In training this classifier, substantial information may be available in the form of documents from one or more previously available corpora for which the equivalent NER task has been performed (e.g. using other classifiers and/or manually). In this task, the newly acquired corpus is the “target domain”, and the previously available corpora are “source domains”. Leveraging source domain data in training a classifier for the target domain is complicated by the possibility that the source corpora may be materially different from the target corpus, e.g. using different vocabulary (in a statistical sense).
Another illustrative task that can benefit from domain adaptation is object recognition performed on images acquired by surveillance cameras at different locations. For example, consider a traffic surveillance camera newly installed at a traffic intersection, which is to identify vehicles running a traffic light governing the intersection. The object recognition task is thus to identify the combination of a red light and a vehicle imaged illegally driving through this red light. In training an image classifier to perform this task, substantial information may be available in the form of labeled images acquired by red light enforcement cameras previously installed at other traffic intersections. In this case, images acquired by the newly installed camera are the “target domain” and images acquired by red light enforcement cameras previously installed at other traffic intersections are the “source domains”. Again, leveraging source domain data in training a classifier for the target domain is complicated by the possibility that the source corpora may be materially different from the target corpus, e.g. having different backgrounds, camera-to-intersection distances, poses, view angles, and/or so forth.
These are merely illustrative tasks. More generally, any machine learning task that seeks to learn a classifier for a target domain having limited or no labeled training instances, but for which one or more similar source domains exist with labeled training instances, can benefit from performing domain adaptation to leverage these source domain(s) data in learning the classifier for the target domain.
Stacked marginalized denoising autoencoders (mSDAs) are a known approach for performing domain adaptation between a source domain and a target domain. See Chen et al., “Marginalized denoising autoencoders for domain adaptation”, ICML (2014); Xu et al., “From sBoW to dCoT marginalized encoders for text representation”, in CIKM, pages 1879-84 (ACM, 2012). Each iteration of the mSDA corrupts features of the feature vectors representing the training instances to produce a domain adaptation layer, and repeated iterations thereby generate a stack of domain adaptation transform layers operative to transform the source and target domains to a common adapted domain. Noise marginalization in the mSDA domain adaptation allows to obtain a closed form solution and to considerably reduce the training time.