Deep learning techniques such as convolutional neural networks (CNNs) are becoming popular for image classification, pixel labeling, path planning, and other applications with broad usage in multiple applications including automotive, industrial, drones, and robotic applications. Typical CNN architecture includes a cascade of multiple layer functions, such as convolutions, non-linearity, spatial pooling, and fully connected layers. Convolution layers consisting of two-dimensional convolutions typically constitute more than 90% of the computations required for image classification and pixel labeling.
One approach for convolution is based on processing in the frequency domain (also known as the FFT domain) using fast Fourier transform (FFT) techniques. In this technique, convolution is simplified to pointwise complex multiplication in an FFT domain. One of the problems with the FFT-based approach is that it generates a large number of frequency domain coefficients that is proportional to the dimensions of the image being analyzed rather than the size of a filter applied to the image, assuming the filter dimensions are much smaller than the image dimensions. The memory required to store the frequency domain coefficients is very high and beyond the typical size of on-chip memory of most computers and/or processing chips, making this technique impractical for most hardware.