This specification relates to autonomous vehicles.
Autonomous vehicles include self-driving cars, boats, and aircraft. Autonomous vehicles use a variety of on-board sensors and computer systems to detect nearby objects and use such detections to make control and navigation decisions.
Some autonomous vehicles use movement detection or road graphs to predict the heading of nearby vehicles. To determine a vehicle heading using movement detection, the position of a vehicle, e.g., as detected by an on-board camera, can be compared over multiple time slices. However, detecting movement of vehicles can be unreliable when vehicles are moving at very slow speeds relative to one another.
A road graph is data that represents the lanes of roads in a particular geographic location and their associated direction of travel. Thus, to determine a vehicle heading using a road graph, the location of a vehicle within a particular lane can be determined, and then the heading determined from the direction of travel associated with that particular lane. However, a road graph is not available for all possible vehicle locations. For example, the road graph typically does not cover parking spaces along the sides of roads.
Some autonomous vehicles have on-board computer systems that implement neural networks for various prediction tasks, e.g., object classification within images. For example, a neural network can be used to determine that an image captured by an on-board camera is likely to be an image of a nearby car. Neural networks, or for brevity, networks, are machine learning models that employ multiple layers of operations to predict one or more outputs from one or more inputs. Neural networks typically include one or more hidden layers situated between an input layer and an output layer. The output of each layer is used as input to another layer in the network, e.g., the next hidden layer or the output layer.
Each layer of a neural network specifies one or more transformation operations to be performed on input to the layer. Some neural network layers have operations that are referred to as neurons. Each neuron receives one or more inputs and generates an output that is received by another neural network layer. Often, each neuron receives inputs from other neurons, and each neuron provides an output to one or more other neurons.
An architecture of a neural network specifies what layers are included in the network and their properties, as well as how the neurons of each layer of the network are connected. In other words, the architecture specifies which layers provide their output as input to which other layers and how the output is provided.
The transformation operations of each layer are performed by computers having installed software modules that implement the transformation operations. Thus, a layer being described as performing operations means that the computers implementing the transformation operations of the layer perform the operations.
Each layer generates one or more outputs using the current values of a set of parameters for the layer. Training the network thus involves continually performing a forward pass on the input, computing gradient values, and updating the current values for the set of parameters for each layer. Once a neural network is trained, the final set of parameter values can be used to make predictions in a production system.
Convolutional neural networks include convolutional neural network layers. Convolutional neural network layers have a neuron connectivity that takes advantage of spatially local correlation in the input data. To do so, convolutional neural network layers have sparse connectivity, with neurons in one convolutional layer receiving input from only a small subset of neurons in the previous neural network layer. The other neurons from which a neuron receives its input defines a receptive field for that neuron.
Convolutional neural network layers have one or more parameters that define one or more filters for each layer, with each filter having one or more parameters. A convolutional neural network layer generates an output by performing a convolution of each neuron's filter with the layer's input.
In addition, each convolutional network layer can have neurons in a three-dimensional arrangement, with depth, width, and height dimensions. The width and height dimensions correspond to the two-dimensional features of the layer's input. The depth-dimension includes one or more depth sublayers of neurons. Convolutional neural networks employ weight sharing so that all neurons in a depth sublayer have the same weights. This provides for translation invariance when detecting features in the input.
Convolutional neural networks can also include fully-connected layers and other kinds of layers. Neurons in fully-connected layers receive input from each neuron in the previous neural network layer.