Massively parallel architectures are required to reach the performances needed for the future applications of “recognition,” “mining,” and “synthesis.” Massively parallel accelerators exist in the form of graphics processors with up to 256 processing elements, for synthesis applications. For the recognition and mining applications, however, nothing equivalent is available. These are application domains where machine learning dominates the computational requirements, and any meaningful acceleration has to focus on parallelizing machine learning.
The support vector machine (SVM) is an algorithm that falls within the classes of recognition and mining. The performance bottleneck in SVMs is the kernel computation, which involves multiplying a very large number of vectors (i.e., vector dot-products). This is not easily parallelizable on multi-core processors due to its massive memory bandwidth requirements.
Accordingly, a system and method is needed for parallelizing and accelerating machine learning and classification.