The term “high-performance computing” generally refers to the utilization of clusters of computers to solve advanced computation problems. The term is most commonly associated with computing undertaken in connection with scientific research or computational science. Exemplary applications that can be classified as high-performance computing applications include, but are not limited to, visual computing, including robust facial recognition and robust 3-D modeling with crowd-sourced photos, research undertaken with respect to web mining, machine learning, and the like.
A conventional approach for performing parallel computation of data in connection with high-performance computing is the single instruction multiple data (SIMD) approach. This approach describes the utilization of computers with multiple processing elements that perform the same operation on multiple different data simultaneously, thereby exploiting data level parallelism. Machines configured to perform SIMD generally undertake staged processing such that a bottleneck is created during synchronization of data. Specifically, another machine or computing element may depend upon output of a separate machine or computing element, and various dependencies may exist. In SIMD, a computing element waits until all data that is depended upon is received and then undertakes processing thereon. This creates a significant scalability bottleneck.
Large-scale data intensive computation has recently attracted a tremendous amount of attention, both in the research community and in industry. Moreover, many algorithms utilized in high-performance computing applications can be expressed as matrix computation. Conventional mechanisms for coding kernels utilized in connection with matrix computation, as well as designing applications that utilize matrix computations, are relatively low level. Specifically, writing new computation kernels that facilitate matrix computation requires a deep understanding of interfaces that allow processes to communicate with one another by sending and receiving messages, such as the message passing interface (MPI). This makes it quite difficult for scientists to program algorithms that facilitate matrix computation.