Pattern recognition systems are widely applied to our daily life to solve real problems in such diverse areas as science, engineering, agriculture, e-commerce, medicine, medical imaging analysis, military, and national security. One important technique in pattern recognition is probabilistic linear discriminant analysis (PLDA) which compensates within-class variability and provides a powerful data-driven mechanism to separate class-specific factors from irrelevant factors. With PLDA, we can build a model of a previously unseen class from a single example, and can combine multiple examples for a better representation of the class. PLDA has been proved effective in face recognition and speaker recognition.
In order to train parameters of PLDA, multiple observations for each of several thousand classes of similar conditions are typically required. However, it is very expensive and even unrealistic to collect such a large amount of resources—in-domain (IND) data—for a new domain of interest for every application. Most available resource-rich data that have already existed are not matched with the domain of interest—this kind of data is called out-of-domain (OOD) data. PLDA trained with OOD data may not be represented properly in IND when mismatch between development and evaluation data is much larger than the variability inside IND. Thus the mismatch of domain between development and evaluation data can greatly deteriorate the performance of pattern recognition systems.
To handle the domain mismatch between development and evaluation data, domain adaptation is applied in order to adapt PLDA parameters developed from already-available OOD data so as to achieve good performance in a new domain where only a small amount of in-domain data is available.
Some of the domain adaptation methods employ linear combination of maximum likelihood estimates (see Non Patent Literature (NPL) 1). As shown in FIG. 9, firstly a PLDA parameter estimation unit 104 trains two sets of PLDA parameters separately using features extracted by a feature extraction unit 103 from OOD data 101 and IND data 102, respectively. After the within-class variabilityΦwin,Φwout  [Math.1]and between-class variabilityΦbin,Φbout  [Math.2]of OOD and IND are obtained, a linear combination unit 105 combines the two sets of PLDA parameters in the wayΦb=αΦbin+(1−α)Φbout,Φw=αΦwin+(1−α)Φwout,  [Math.3]and creates adapted PLDA parameters 106. Hereα  [Math.4]is a weighting coefficient to determine how much IND data contributes. In the evaluation phase, a PLDA classification unit 107 computes a score for a given pair of features respectively extracted from enrollment and test data. In this method, PLDA parameters(Φb,Φw)  [Math.5]are biased toward(Φbout,Φwout)  [Math.6]so it can only work when the OOD is close to the IND. However, it is not always true. When the OOD is far from IND, the combined PLDA parameters may not fall in the vicinity of the true parameters. Furthermore, extra training data is necessary to estimate the weighting coefficientα.  [Math.7]Hence it is not feasible to compensate the mismatch of domains. FIG. 9 is a block diagram of related art 1—parameter adaptation using linear combination based on two sets of PLDA parameters trained by OOD and IND data.
The method mentioned above focused on parameter adaptation, while another class of methods focuses on data compensation techniques as shown in FIG. 10. FIG. 10 is a block diagram of related art 2—i-vector compensation. It shows the features are shifted by using the knowledge of the statistics of OOD and IND data. In the same way as the previous one, these methods extract two sets of features by a feature extraction unit 203 from OOD data 201 and IND data 202, respectively. It is assumed that there is a shift of data between development (OOD) and evaluation (IND) data because of domain mismatch. They explicitly model dataset variation as a shift in the feature space (204) and reduce it as a pre-processing cleanup step (206). After that, a PLDA parameter estimation unit 207 estimates PLDA parameters 208 from the data in which the data set variation has been reduced (see NPL 2). In the evaluation phase, a PLDA classification unit 209 computes a score in the same way as the PLDA classification 107 in the previous methods. In their method, transformation is not optimized in the framework of PLDA. There are two or more criteria to optimize total system parameters, such as maximum likelihood (ML), minimum distance and so on. Thus, it cannot reach the global optimum.