This specification relates to autonomous vehicles.
Autonomous vehicles include self-driving cars, boats, and aircraft. Autonomous vehicles use a variety of on-board sensors and computer systems to detect nearby objects and use such detections to make control and navigation decisions.
Some autonomous vehicles have computer systems that implement neural networks for object classification within images. For example, a neural network can be used to determine that an image captured by an on-board camera is likely to be an image of a nearby car.
Neural networks, or for brevity, networks, are machine learning models that employ multiple layers of operations to predict one or more outputs from one or more inputs. Neural networks typically include one or more hidden layers situated between an input layer and an output layer. The output of each layer is used as input to another layer in the network, e.g., the next hidden layer or the output layer.
Each layer of a neural network specifies one or more transformation operations to be performed on input to the layer. Some neural network layers have operations that are referred to as neurons. Each neuron receives one or more inputs and generates an output that is received by another neural network layer. Often, each neuron receives inputs from other neurons, and each neuron provides an output to one or more other neurons.
An architecture of a neural network specifies what layers are included in the network and their properties, as well as how the neurons of each layer of the network are connected. In other words, the architecture specifies which layers provide their output as input to which other layers and how the output is provided.
The transformation operations of each layer are performed by computers having installed software modules that implement the transformation operations. Thus, a layer being described as performing operations means that the computers implementing the transformation operations of the layer perform the operations.
Each layer generates one or more outputs using the current values of a set of parameters for the layer. Training the network thus involves continually performing a forward pass on the input, computing gradient values, and updating the current values for the set of parameters for each layer. Once a neural network is trained, the final set of parameters can be used to make predictions in a production system.
Convolutional neural networks include convolutional neural network layers. Convolutional neural network layers have a neuron connectivity that takes advantage of spatially local correlation in the input data. To do so, convolutional neural network layers have sparse connectivity, with neurons in one convolutional layer receiving input from only a small subset of neurons in the previous neural network layer. The other neurons from which a neuron receives its input defines a receptive field for that neuron.
Convolutional neural network layers have one or more parameters that define one or more filters for each layer, with each filter having one or more parameters. A convolutional neural network layer generates an output by performing a convolution of each neuron's filter with the layer's input.
In addition, each convolutional network layer can have neurons in a three-dimensional arrangement, with depth, width, and height dimensions. The width and height dimensions correspond to the two-dimensional features of the layer's input. The depth-dimension includes one or more depth sublayers of neurons. Convolutional neural networks employ weight sharing so that all neurons in a depth sublayer have the same weights. This provides for translation invariance when detecting features in the input.
Convolutional neural networks can also include fully-connected layers and other kinds of layers. Neurons in fully-connected layers receive input from each neuron in the previous neural network layer.
Autonomous and semi-autonomous vehicle systems can use full-vehicle predictions for making driving decisions. A full-vehicle prediction is a prediction about a region of space that is occupied by a vehicle. The predicted region of space can include space that is unobservable to a set of on-board sensors used to make the prediction.
Autonomous vehicle systems can make full-vehicle predictions using human-programmed logic. The human-programmed logic specifies precisely how the outputs of on-board sensors should be combined, transformed, and weighted, in order to compute a full-vehicle prediction.