Convolution operations are a building block of convolutional neural networks that are state-of-the-art in computer vision tasks like object recognition. Many conventional designs include some form of hardware drawbacks or limitations. A common design approach is to fetch two-dimensional input data and process the values for each output element separately. The separate fetches are inefficient. Memory bandwidth is wasted due to redundant fetches. To mitigate the bandwidth issue, other common designs use a row buffer to avoid re-fetching the same data. However, the row buffers utilize significant hardware while also limiting the flexibility of window sizes.
It would be desirable to implement convolution calculations in multiple dimensions.