The present disclosure relates generally to matrix multiply operations. More particularly, the present disclosure relates to methods and apparatuses to implement systolic array matrix multiplier for matrix multiply operations.
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Many numerical computing applications, such as high-performance computing, deep learning (e.g., study of artificial neural networks and related machine learning algorithms), and digital signal processing (DSP), rely on matrix multiplication computations. There has been great success using systolic arrays in hardware and software to perform matrix multiplication computations. However, there may be challenges implementing systolic array architecture on a field-programmable gate array (FPGA) platform. For example, there may be challenges relating to limitations in external memory (e.g., memory external to an integrated circuit, off-chip-memory) bandwidth and limitations in FPGA on-chip memory. In particular, off-chip memory bandwidth may be insufficient to sustain peak operating performance of the systolic array, while on-chip memory bandwidth may be higher but still limited.