Neural networks are machine learning models that can be trained to predict an output for a received input. Some neural networks include one or more hidden layers of nonlinear units (e.g., nodes) in addition to an output layer. The output of each hidden layer can be used as input to the next layer in the network, e.g., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of internal parameters of that layer, such as values that represent weights assigned to the nonlinear units in the layer.
Neural networks have been trained to perform various data processing tasks, such as classification, prediction, and translation. Some systems include multiple data processing components, e.g., in successive stages, to carry out a given task.
Recently, computing devices that provide multiple user input modalities have become more prevalent. For example, smartphones and other user devices include speech recognition services that allow users to provide voice inputs to a device as an alternative to typing or pointing inputs. Voice-based inputs may be more convenient in some circumstances as a hands-free means for interacting with the computing device. Some devices require that a user's identity be verified before performing an action based upon voice input, in order to guard against breaches of privacy and security.