This specification relates to neural network architectures and compressing neural networks.
Neural networks are machine learning models that employ one or more layers of nonlinear units to predict an output for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters. Some neural networks, e.g., those that are designed for time series problems or sequence-to-sequence learning (recurrent neural networks (RNN)), incorporate recurrent loops which permit memory, in the form of a hidden state variable, to persist within a layer between data inputs. A variation of RNNs, long short-term memory (LSTM) neural networks include multiple gates within each layer to control the persistence of data between data inputs. Some neural networks, e.g., those that are designed for time series problems or sequence-to-sequence learning, incorporate recurrent loops which permit memory, in the form of a hidden state variable, to persist within a layer between data inputs.