As is known, large scale neural networks have found widespread applicability in a number of technological arts including natural language processing, video motion analysis, decision systems and drug design. Of particular importance to the performance of a neural network—is its training.
Training a large neural network saturated with nonlinearity, however, is notoriously difficult. For example, it may take 10,000 central processing unit (CPU) cores several days to complete the training of a network having one billion parameters.
Given this importance and difficulty, systems and methods that improve the efficiency of neural network training would be a welcome addition to the art.