Convolutional neural networks (CNNs) are used in a variety of applications, including for example, image processing. Convolution operations include a summation of each element of an input feature map (IFM) with neighboring elements that are weighted by a filter, which is also referred to as a kernel.
CNNs include multiple layers in which each layer performs a convolution operation on a three-dimensional volume that includes multiple sets of two-dimensional IFMs. In CNN implementations involving Graphic Processing Units (GPUs), the GPU restructures the convolution operation as a matrix multiplication operation by extracting local neighboring elements of each element of the IFM and expanding the volume into matrix format before performing the matrix multiplication. The out-of-order access pattern for extracting the local neighboring elements is limited by the memory available for static expansion of the IFM. Because of the high ratio of computational capacity to memory in field programmable gate arrays (FPGAs), static expansion of the volume is not feasible in FPGA accelerators due to the latency and bandwidth limitations required to run the FPGA at high efficiency.