This specification relates to processing structured documents using neural networks.
Neural networks are machine learning models that employ one or more layers of nonlinear units to predict an output for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to one or more other layers in the network, e.g., the next hidden layer, the output layer, or both. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.
Convolutional neural networks are neural networks that include one or more convolutional layers. Convolutional layers are generally sparsely-connected neural network layers. That is, each node in a convolutional layer receives an input from a portion of, i.e., less than all of, the nodes in the preceding neural network layer or, if the convolutional layer is the lowest layer in the sequence, a portion of an input to the neural network, and produces an activation from the input. Generally, convolutional layers have nodes that produce an activation by convolving received inputs in accordance with a set of weights for each node. In some cases, nodes in a convolutional layer may be configured to share weights. That is, all of or a portion of the nodes in the layer may be constrained to always have the same weight values as the other nodes in the layer. Convolutional layers are generally considered to be well-suited for processing images because of their ability to extract features from an input image that depend on the relative location of the pixel data in the input image.