In recent years, deep neural network (DNN) based models have made significant progress due to the availability of large labeled datasets and continuous improvements in computation resources. DNNs are utilized in various applications including, for example, object/speech recognition, language translation, pattern extraction, and image processing. The quality of the DNN models depends on the processing of a large amount of training data and an increased complexity of the neural network. In this regard, training a complex DNN model is a time consuming and computationally intensive task which can require many days or weeks to perform using parallel and distributed computing frameworks with many computing nodes (e.g., datacenter-scale computational resources) to complete the training of the DNN model.
To reduce training time, hardware acceleration techniques for processing DNN workloads have been pursued either in conventional CMOS technologies or by using emerging non-volatile memory (NVM) technologies. However, it has been found that resistive processing unit (RPU) accelerator devices have the potential to accelerate DNN training by orders of magnitude, while using less power, as compared to conventional hardware acceleration techniques. DNN training generally relies on a backpropagation algorithm which includes three repeating cycles: forward, backward and weight update. It has been determined that RPU accelerator devices which are based on a two-dimensional (2D) crossbar array of RPU storage cells, can be configured to perform all three cycles of the backpropagation algorithm in parallel, thus potentially providing significant acceleration in DNN training with lower power and reduced computation resources compared to state-of-the-art implementations using central processing units (CPUs) and graphics processing units (GPUs). An RPU accelerator can store and update weight values locally, thereby minimizing data movement during training and fully exploiting the locality and the parallelism of the DNN training process. Analog weight storage elements in RPU storage cells are capable of storing weight values without sign (e.g., positive, negative, zero). However, RPU operations for DNN training and other applications require processing of positive, zero and negative weight values.