Deep computing frameworks, such as Convolutional Neural Networks (CNNs), have been used in many application areas, including pattern recognition, signal processing, time series analysis, and the like. CNNs require large amounts of computation involving a usually large number of parameters both during training and when the fully-trained network are deployed in the field. CNNs are deployed in mobile and embedded systems that interact with the real world. However, efficiency of CNNs that require such large amounts of computation and data may be limited by the power (e.g., battery), memory access bandwidth, and communication cost.
General-purpose processors may be programmable to perform complex calculations. However such processors may consume more power and perform operations at a lower speed. Graphical computing unit (GPU) may be configured to run faster than general-purpose processors; however, higher power consumption may be required. It would be helpful to have a method and system that satisfy the requirements for reduced latency and low power consumption.