Embodiments of the invention relate to regression processing and, in particular, sketching for M-estimators for performing regression and providing an oblivious subspace embedding (OSE) for embedding a space induced by a nonlinear kernel.
Linear regression is widely used in biological, behavioral and social sciences to describe possible relationships between variables. It ranks as one of the most important tools used in these disciplines. Regression processing sometimes involves the tool of sketching, which in generality is a descendent of random projection methods. Sketching has emerged as a powerful dimensionality reduction technique for accelerating statistical learning techniques such as lp-regression, low rank approximation, principal component analysis (PCA), and support vector machines. For natural settings of parameters, the sketching technique has led to asymptotically optimal algorithms for a number of applications, often providing a speedup of order of magnitude over slower exact algorithms given by, e.g., the singular value decomposition (SVD). Oblivious subspace embedding (OSE) is essentially a data-independent random transform which is an approximate isometry over a subspace. It is crucial for any reasonable use of an OSE that applying it to a vector or a collection of vectors (a matrix) can be done in time that is faster than that of the nave algorithm, or at least faster than the intended downstream use.
Conventional OSEs are for subspaces that have a representation as the column space of an explicitly provided matrix, or close variants of it, and they admit a fast multiplication given an explicit representation of a vector or matrix (either dense or sparse). This is quite unsatisfactory in many statistical learning settings. In many cases the input may be described by a moderately sized n-by-d sample-by-feature matrix A, but the actual learning is done in a much higher (possibly infinite) dimensional space, by mapping each row of A to the higher dimensional feature space.