Cellular Neural Networks or Cellular Nonlinear Networks (CNN) have been applied to many different fields and problems including, but limited to, image processing since 1988. However, most of the prior art CNN approaches are either based on software solutions (e.g., Convolutional Neural Networks, Recurrent Neural Networks, etc.) or based on hardware that are designed for other purposes (e.g., graphic processing, general computation, etc.). As a result, CNN prior approaches are too slow in term of computational speed and/or too expensive thereby impractical for processing large amount of imagery data. The imagery data can be from any two-dimensional data (e.g., still photo, picture, a frame of a video stream, converted form of voice data, etc.).
Traditional deep learning network architecture for classifying two-dimensional input imagery data generally contains two parts: ordered convolutional layers followed by fully-connected (FC) layers. Notably, ordered convolutional layers require less storage for holding filter coefficients but require significantly larger amounts of computation for ‘multiplication-add’s (Mult-Adds) (e.g., VGG16 requires 15 TFLOPs) due to the repeated applications of convolutional filter kernels. On the contrary, FC layers require less computations for Mult-Adds but necessitate a significant amount of storage (e.g., VGG16 requires storage for about 123 millions of FC layer weights/coefficients) for storing coefficients due to inner-products (i.e., respective multiplications between FC layer weights and nodal feature values obtained in the previous level). With operations of ordered convolutional layers performed in a CNN based integrated circuit, the computation bottleneck in deep learning networks is in FC layers. Since FC layers require large amount of storage, prior art approaches have been using computational devices outside of the CNN based integrated circuit, for example CPU (central processing unit) or GPU (graphics processing unit).
It would therefore be desirable to have systems and methods of performing image classification task within in a CNN based integrated circuit entirely.