Models representing data relationships and patterns, such as functions, algorithms, systems, and the like, may accept input (sometimes referred to as an input vector), and produce output (sometimes referred to as an output vector) that corresponds to the input in some way. For example, a model may be implemented as an artificial neural network (“NN”). Artificial neural networks are artificial in the sense that they are computational entities, analogous to biological neural networks in animals, but implemented by computing devices. Output in NN-based models is obtained by doing a “forward pass.” The forward pass involves multiplying large NN weight matrices, representing the parameters of the model, by vectors corresponding to input feature vectors or hidden intermediate representations. In recognition systems, such as systems designed to recognize speech, handwriting, faces, and the like, NN-based models may generate probability scores via the forward pass. The probability scores may indicate the probability that the input corresponds to a particular label, class, or the like.
The parameters of a NN can be set in a process referred to as training. For example, a NN-based model can be trained using training data that includes input data and the correct or preferred output of the model for the corresponding input data. Sets of individual input vectors (“mini-batches”) may be processed at the same time by using an input matrix instead of a single input vector. The NN can repeatedly process the input data, and the parameters (e.g., the weight matrices) of the NN can be modified in what amounts to a trial-and-error process until the model produces (or “converges” on) the correct or preferred output. The modification of weight values may be performed through a process referred to as “back propagation.” Back propagation includes determining the difference between the expected model output and the obtained model output, and then determining how to modify the values of some or all parameters of the model to reduce the difference between the expected model output and the obtained model output.