This specification relates to training machine learning models that can be used on a mobile device for image recognition. For example, a user may desire additional information related to content that a user is viewing on a mobile device. Such content may be either textual or an image. The user may desire to have an object or place of interest highlighted on the user device or some other feature that is indicative of information that may be useful to the user. For example, a user may encounter a restaurant and desire to know additional information about the restaurant.
A machine learning model can receive input and generate an output based on the received input and on values of the parameters of the model. For example, machine learning models may receive an image and generate a score for each of a set of classes, with the score for a given class representing a probability that the image contains an image of an object that belongs to the class.
The machine learning model may be composed of, e.g., a single level of linear or non-linear operations or may be a deep network, i.e., a machine learning model that is composed of a convolutional neural network. An example of a deep network is a neural network with one or more hidden layers. Neural networks are machine learning models that employ layers of nonlinear units to predict an output for a received input. Some neural networks are deep neural networks that include hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.