The field of machine learning can be thought of the study of techniques for getting computers to act without being explicitly programmed to perform a particular task, and additionally, for enabling these computers to become better at these tasks over time. In just the past few years, the ever-advancing field of machine learning has been used for increasingly large numbers of practical applications, resulting in technologies such as self-driving vehicles, improved Internet search engines, speech, audio, and/or visual recognition systems, human health data and genome analysis, recommendation systems, fraud detection systems, etc.
The growth of the amounts and types of data being produced by both humans and non-humans, combined with the increases in availability and power of computational processing and data storage, have thus led to an explosion in the interest of employing machine learning techniques by a wide a variety of people.
Many machine learning algorithms, as well as other modern computing applications, rely upon the use of linear algebra. For example, many machine learning algorithms use a classifier or regressor, and train it by minimizing the error between the value calculated by the nascent classifier and the actual value from the training data. This can be done either iteratively or using linear algebra techniques, which usually involve singular value decomposition (SVD) or a variant thereof.
Many recent machine learning applications involve the use of sparse datasets, typically in the form of sparse matrices. A sparse matrix is a matrix in which many or most of the elements in the matrix have a default value (e.g., 0, NULL). For example, some machine learning applications for classifying documents may utilize a matrix including dimensions (or “columns” in the matrix) for words that are used in these documents; thus, a particular document may include only a small number of the overall number of words, and thus an entry (or “row”) within the matrix for this document many have a substantial number of “empty” elements. Such sparse matrices are often represented in a compressed or alternate representation, which can use a number of different precise formats and data structures, though these all typically attempt to eliminate storing non-zero elements (i.e., they store only non-zero entries). Two such examples include Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC).
However, linear algebra operations (and especially sparse linear algebra operations) are very difficult to parallelize in modern computing systems, at least in part due to potential write-to-read dependences across iterations (of a loop that updates values in a matrix, for example).
Current approaches for performing sparse linear algebra operations use either locking techniques or approximate lock-free implementations. Locking continues to generate the same solution as the sequential part and trades-off locking overhead for greater parallelism. However, as a result of locking overhead, previous approaches have shown that the performance does not scale beyond 2-4 cores and does not result in anything near linear performance scaling even until 4 cores.
The second approach—involving the use of approximate lock-free implementations—does get close to linear performance scaling, but does not achieve the best solution due to fundamentally seeking on approximations. Furthermore, the output deviation can be particularly high for datasets have a power-law distribution where some features are more common than others, which leads to greater chances of incorrect updates.
Accordingly, techniques providing enhanced parallelism for sparse linear algebra operations having write-to-read dependencies are strongly desired.