This invention relates to initialization and learning rate adjustment for artificial neural networks (ANNs) based on rectifier linear unit (ReLU).
ANNs have been used for tasks including mapping of input vectors derived from inputs, for instance acoustic inputs that include speech, to probability distributions that are used, for instance, for speech recognition. Operation of an ANN is determined by the structure of a data flow graph for the ANN, which may use numerical coefficients, generally referred to as “weights”, that multiply numerical values passing through the graph. The characteristics of the weights can have a great impact on the performance of system making use of the ANN.
Two aspects of ANNs have been used to achieve high performance. First, ANNs with a large number of hidden layers (for example, five or more layers) have shown advantages over using fewer layers. Each hidden layer includes a set of nodes that are internal to the data flow graph such that data generally flows from layer to layer within the graph. Second, the hidden nodes implement non-linearities that include “rectifier” functions (i.e., mappings from a numerical input value to a numerical output value), which map negative input values to zero, and map positive input values to the output without modification. In general, the input value to a rectifier at one layer is a weighted sum of the output values of the directly previous layer. Nodes that make use of such rectifier functions of weighted inputs may be referred to as Rectifier Linear Units (ReLUs).