Kernel and graph methods revolutionized machine learning by providing the ability to solve nonlinear problems analytically. Well known kernel and graph methods include support vector machines, kernel principal component analysis, spectral clustering, and normalized cuts. These methods extend many linear statistical data analysis techniques. Compared to other nonlinear methods, such as neural networks, a key advantage in the use of kernel and graph methods is that kernel and graph methods provide a unique global solution. For this reason, kernel and graph methods can be solved analytically or using fast optimization techniques. However, the ability to solve nonlinear problems exactly comes at the cost of needing to handle all pairwise combinations of data points in a dataset, which results in a problem with a computational complexity of at least O(N2), where N is the number of data points. The computational complexity has limited the applicability of kernel and graph methods to large datasets because storing the matrix of all pairwise interactions is a formidable task for more than a few tens of thousands of data points and because computing the eigendecomposition, or inverse of this matrix, as required by many of these methods, poses yet another major challenge.