Kernel-based methods such as Support Vector Machines (SVM) represent the state-of-the-art in classification techniques. Support Vector Machines are a set of related supervised learning methods used for data classification. However, their application is limited by the scaling behavior of their training algorithm which, in most cases, scales quadratically with the number of training examples. When dealing with very large datasets, a key issue in SVM learning is to find examples which are critical for defining the separation between two classification classes quickly and efficiently. Traditional SVM often relies on sequential optimization where only a few examples are added in each computation iteration and requires performing dot-products over sparse feature vectors. In most iterative algorithms, the kernel computation can be folded into a matrix-vector multiplication; however, these types of algorithms are extremely inefficient when dealing with sparse data. An m by n matrix M is a 2-dimensional array of numbers or abstract quantities with m rows and n columns. A vector is simply an m by 1 or a 1 by n matrix. A dot-product, also known as the scalar product, is a binary operation which takes two vectors over the real numbers R and returns a real-valued scalar quantity. In Machine Learning (ML), the kernel trick is a method for easily converting a linear classifier algorithm into a non-linear one, by mapping the original observations into a higher-dimensional non-linear space so that linear classification in the new space is equivalent to non-linear classification in the original space.
Therefore, a need exists for a method for providing fast kernel learning on sparse data in kernel based learning.