Deep neural networks (DNNs) typically consist of convolutional layers (CNVL) and fully connected layers (FCL). Vector-matrix multiplications involving multiply-accumulate (MAC) computation extensively exists in the CNVLs and the FCLs. High precision MAC operations impose high demands on computational resources and memory storage in the DNNs, making it challenging to implement state-of-the-art DNNs on resource-limited platforms such as mobile devices.
Binary DNNs are proposed to effectively reduce computation and storage costs with marginal accuracy degradation. Among the reported binary DNNs, XNOR-Net presents a remarkable advantage in accuracy (e.g., >17% higher in accuracy with AlexNet on ImageNet than other Binary-Net). In an XNOR-Net, both synaptic weights and neuron activations are binarized to +1 or −1 and the high-precision MAC operations are thus replaced by XNOR and bit-counting operations. However, the memory bottleneck remains in the conventional complementary metal-oxide semiconductor (CMOS) application specific integrated circuit (ASIC) accelerators. Although parallel computation has been exploited across processing-element (PE) arrays, the weights and intermediate data still require inefficient row-by-row static random access memory (SRAM) access.
Computing-in-memory (CIM) is a technique that may improve parallelism within a memory array by activating multiple rows and using analog column current to conduct multiplication and summation operations. In this regard, it may desirable to customize SRAM bit cells to enable XNOR and bit-counting operation in the XNOR-Net.