Cellular Neural Networks or Cellular Nonlinear Networks (CNN) have been applied to many different fields and problems including, but limited to, image processing since 1988. However, most of the prior art CNN approaches are either based on software solutions (e.g., Convolutional Neural Networks, Recurrent Neural Networks, etc.) or based on hardware that are designed for other purposes (e.g., graphic processing, general computation, etc.). As a result, CNN prior approaches are too slow in term of computational speed and/or too expensive thereby impractical for processing large amount of imagery data. The imagery data can be from any two-dimensional data (e.g., still photo, picture, a frame of a video stream, converted form of voice data, etc.).
For image classification, it is necessary to extract features (i.e., feature vectors) out of an input data first then connect to a classifier such as Fully-Connected (FC) layers to achieve the task. Through inner product computations, the FC layers use the extracted features from the output of the ordered convolutional layers to complete the classification task. However, the FC layers contain multiple layers of fully connected neural networks, which require large number of coefficients. For example, in VGG16 model, the output of the ordered convolutional layers is 512×7×7=25088, which is a very large vector. First few FC layers (e.g., fc6, fc7) are therefore required to project the high dimensional vector to a relatively low dimensional space, e.g., 4096, 1024, or smaller number (e.g., 128). Disadvantage of such operations is that the huge number of parameters (e.g. more than 100 million (i.e., 25088×4096) for the FC layer connecting to convolutional layer). As a result, runtime performance is low due to such a high computation complexity. Another shortcoming, disadvantage of prior art approaches is when computational resources (i.e., processing power, memory and storage) are limited in a micro controller unit. It generally does not have enough storage to store the large number of filter coefficients. Further, there is not enough runtime memory for loading such a large number of filter coefficients even with prior art approach of compressing final FC layer from 4096 to 128 channels of features, Therefore, it would be desirable to have improved image classification systems that avoid the above-mentioned shortcomings and/or problems.