Matrices, or more broadly tensors, are used by processing circuitry to provide solutions to a variety of different problems. For example, image processing sometimes use convolution matrices. Different types of processing circuitry can be used for such processing.
There are a variety of different circuits that can use convolution matrices including, but not limited to, digital signal processors (DSPs), general purpose computer processors, programmable integrated circuits, programmable logic devices (PLDs), and System on Chip (SoC) devices. PLDs are a type of programmable integrated circuit (IC) that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), can include an array of programmable tiles. These programmable tiles comprise various types of logic blocks, which can include, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), dedicated block random access memory (BRAM), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay locked loops (DLLs), bus or network interfaces such as Peripheral Component Interconnect (PCI), PCI Express (PCIe), Ethernet, and so forth. Some devices include enough components and functionality to effectively serve as an entire computer system on a single IC chip. Devices with such functionality are sometimes referred to as SoCs. Some SoC devices can include programmable logic that is similar to programmable logic provided by various PLDs.
The various circuits often suffer from similar bottlenecks when attempting to implement convolution matrices. A common bottleneck is the movement of data to and from memory circuitry. In addition to using large datasets with convolution operations, the convolution operations can be repeated several times on the same data. Data might therefore not be provided fast enough to fully utilize the processing circuits.
These and other problems can be problematic for convolution tensor operations.