Kernel functions have become a popular tool of machine learning and methods that automate the task of specifying a suitable kernel as become increasingly important. More particularly, the known Multiple Kernel Learning (MKL) problem of finding a combination of pre-specified base kernels that is suitable for a particular task at hand has received significant interest.
Generally, the prior art has approached this problem along two paths. The first path solves a joint optimization problem over both the weights of the kernel combination and the parameters of the classifier. Such one-stage approach has been described by Lanckriet et al. (See, G. R. G Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui and M. I. Jordan, “Learning the Kernel Matrix with Semidefinite Programming”, Journal of Machine Learning Research, 5:27-72, 2004) and has since received significant attention directed at providing faster algorithms (See, A. Rakotomamonjy, F. Bach, S. Canu and Y. Grandvalet, “More Efficiency in Multi Kernel Learning”, in Internation Conference on Machine Learning (ICML-11), pp. 249-256, 2011; S. Sonnenburg, G. Ratsch, C. Schafer, and B. Scholkpf, “Large Scale Multiple Kernel Learning”, Journal of Machine Learning Research, 7, 2006) Likewise, a number of theoretical analysis have been described (See, e.g., C. Cortes, M. Mohri, and A. Ros-tamizadeh, “Two Stage Learning Kernel Algorithms”, in International Conference on Machine Learning, 2010; M. Kloft, U. Brefeld, S. Sonnenburg and A. Zien, “lp-Norm Multiple Kernel Learning”, Journal of Machine Learning Research, 12:953-997, 2011; F. Bach, “Consistency of the Group Lasso and Multiple Kernel Learning”, Journal of Machine Learning Research, 9:1179-1225, 2008). Additionally, extensions to multi-class classification have been explored (See., e.g., A. Zien and C. S. Ong, “Multiclass Multiple Kernel Learning”, in International Conference on Machine Learning, 2007; and finally to non-linear combinations of kernels (See., e.g., C. Cortes, M. Mohri, and A. Rostamizadeh, “Learning non-linear combinations in Kernels”, in Advances in Neural Information Processing Systems, 200.)
The second path in kernel learning follows a two-stage approach: first learn a “good” combination of base kernels using the training data, then use the learned kernel with a standard kernel method such as SVM or kernel ridge regression to obtain a classifier/regressor. This two-stage leaning approaches so far (See., e.g., C. Cortes, M. Mohri, and A. Rostamizadeh, “Two-Stage Learning Kernel Algorithms”, in International Conference on Machine Learning, 2010; and N. Cristianini, J. Shawe-Taylor, A. Elisseef, and J. S. Kandola, “On Kernel-Target Alignment”, in NIPS, 2001) have generally employed the notion of target alignment. Target alignment, intuitively, is a measure of similarity (agreement) between a kernel and the target kernel, which is derived from the training labels, and represents the (an) optimal kernel for the training sample.
Notwithstanding advances, methods, structures or techniques that address such aspects would represent a significant advance in the art.