Domain adaptation in the absence of source domain data
Boris Chidlovskii*, XRCE; Stephane Clinchant, Xerox Research Centre Europe; Gabriela Csurka, Xerox Research Centre Europe
The overwhelming majority of existing domain adaptation methods makes an assumption of freely available source domain data. An equal access to both source and target data makes it possible to measure the discrepancy between their distributions and to build representations common to both target and source domains. In reality, such a simplifying assumption rarely holds, since source data are routinely a subject of legal and contractual constraints between data owners and data customers. When source domain data can not be accessed, decision making procedures are often available for adaptation nevertheless. These procedures are often presented in the form of classification, identification, ranking etc. rules trained on source data and made ready for a direct deployment and later reuse. In other cases, the owner of a source data is allowed to share a few representative examples such as class means.
In this paper we address the domain adaptation problem in real world applications, where the reuse of source domain data is limited to classification rules or a few representative examples. We ex-tend the recent techniques of feature corruption and their marginalization, both in supervised and unsupervised settings. We test and compare them on private and publicly available source datasets and show that significant performance gains can be achieved despite the absence of source data and shortage of labeled target data.
Filed under: Classification