A goal of data fusion is to obtain a classifier h such that h learns from all the views available for each training point and has classification accuracy that is better than the case when only one view is available. It is not always possible to separate the classes using information from a single view. However, if information from all the views is combined, a better classification performance may be achieved. Thus, a good fusion algorithm can outperform individual classifiers.
Considerable research in the pattern recognition field is focused on fusion rules that aggregate the outputs of the first level experts and make a final decision, as discussed in J. Kittler, “Combining Classifiers: A Theoretical Framework,” Pattern Analysis and Applications, Vol. 1, pp. 18-27, Springer, 1998, and J. Kittler, “A Framework for Classifier Fusion: Is Still Needed?” Lecture Notes in Computer Science, Vol. 1876, pp. 45-56, 2000.
Wolpert describes stacked generalization as a general technique for construction of multi-level learning systems, in Wolpert, D. H., “Stacked generalization”, Neural Networks, Vol. 5, pp. 241-259 (1992). In the context of classifier combination, stacked generalization can yield unbiased, full-size training sets for the trainable combiner.
Lanckriet, G. R., et al., “Kernel-based data fusion and its application to protein function prediction in yeast”, Proceedings of the Pacific Symposium on Biocomputing, Vol. 9, pages 300-311 (2004) discloses a kernel-based data fusion approach for protein function prediction in yeast that combines multiple kernel representations in an optimal fashion by formulating the problem as a convex optimization problem to be solved using semidefinite programming techniques.
Techniques for fusing expert observations include linear weighted voting, the naïve Bayes classifiers, the kernel function approach, potential functions, decision trees or multilayer perceptrons. Such techniques are described in J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On Combining Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, pp. 226-239, 1998; L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision Templates for Multiple Classifier Fusion: An Experimental Comparison,” Pattern Recognition, Vol. 34, pp. 299-314, 2001; L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals,” IEEE Trans. Systems, Man, and Cybernetics, Vol. 22, pp. 418-435, 1992; and S. Hashem, “Optimal Linear Combination of Neural Networks,” Neural Networks, Vol. 19, pp. 599-614, 1997.
Boosting techniques are intended to improve the prediction accuracy of learning methods, by combining weak classifiers, having poor performance, into a strong classification rule with a better performance. A system known as AdaBoost is described in Y. Freund and R. E. Schapire, “A decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Computer and Systems Science, Vol. 55, pp. 119-139, 1997, and in Y. Freund and R. Schapire, “A short introduction to boosting”, Journal of Japanese Society for Artificial Intelligence, 5(14):771 780, September 1999.