Linear algebra operations are typically computation and memory intensive operations involving potentially large, multi-dimensional matrix operands. Systems are typically designed for low arithmetic intensity operations (i.e., the ratio of arithmetic operations to memory operations), and thus are not designed for efficient execution of linear algebra operations. Furthermore, system processors typically utilize complex local memory (i.e., cache) management routines for operations involving large matrix operands, thereby increasing processing overhead and execution complexity.
Descriptions of certain details and implementations follow, including a description of the figures, which can depict some or all of the embodiments described below, as well as a description of other potential embodiments or implementations of the concepts presented herein. An overview of embodiments is provided below, followed by a more detailed description with reference to the drawings.