The present invention generally relates to neural network hardware, and more particularly to neural network hardware having resistive processing units and training methods using the neural network hardware having the resistive processing units.
Deep Neural Networks (DNNs) demonstrated significant commercial success in recent years with performance exceeding sophisticated prior methods in speech and object recognition. However, training the DNNs is an extremely computationally intensive task that requires massive computational resources and enormous training time that hinders their further application. For example, a 70% relative improvement has been demonstrated for a DNN with 1 billion connections that was trained on a cluster with 1000 machines for three days. Training the DNNs relies in general on the backpropagation algorithm that is intrinsically local and parallel. Therefore, it is desirable to exploit hardware approaches to accelerate DNN training.
Therefore, heretofore unaddressed needs still exist in the art to address the aforementioned deficiencies and inadequacies.