Modern Convolutional Neural Networks (CNNs) have achieved great success in computer vision related tasks. A CNN can outperform human beings in certain computer vision tasks. CNNs can be trained to capture highly non-linear complex features at the cost of high computation and memory bandwidth. Capturing the highly non-linear complex features involves high dimensional intermediate vectors/tensors being exchanged through dynamic random access memory (DRAM). The DRAM traffic consumes a significant amount of DRAM bandwidth and can potentially slow down the performance of a whole system.
It would be desirable to implement a high throughput hardware unit providing efficient lossless data compression in convolution neural networks.