Conventionally, machine learning techniques have required massive amounts of processing power and storage space. One conventional type of machine learning involves a kernel machine. Kernel machines have been employed in applications including, for example, audio/video processing, power management, circuit layout, scheduling, and so on. Kernel machines may employ a kernel matrix (K). The kernel matrix K may be constructed by applying a kernel function to pairs of points in a training data set. If there are N data points, then the kernel matrix K is an N×N matrix. This matrix grows quadratically with the size of the data set. This means that as N grows, performing conceptually simple tasks becomes practically impossible due to storage and/or processing power constraints.
Conventional kernel machines may sparsify K. However conventional kernel machines still require that K be computed in its entirety before the sparsification is performed. While a sparsifying kernel machine (SKM) may be more efficient than a KM that does not sparsify, even an SKM will become unwieldy for larger data sets since the quadratically growing K will still require impractical amounts of processing time and/or memory.