In speech recognition, input data representing speech has been processed using models, which are computer-readable data structures used by computer systems in processing speech signals. For example, input speech may be processed to produce features, which can be processed by a speech recognition engine using an acoustic model, a word lexicon (or pronunciation model), and a language model. Different structures have been used for acoustic models. For example, an acoustic model may be a deep neural network, which is a network having multiple layers that are used to process input speech audio features. Each such layer may be in the form of a matrix of weights, which are used to operate on a vector of audio features. The layers can be used in a series of operations, such as with the output of an operation using a first layer being used as input to an operation using a second layer. In some models, a layer may be decomposed into a pair of matrices (such as using single value decomposition), where an operation on the pair of matrices can yield a single matrix of the layer. In processing speech, an operation may be performed on the input data using the first matrix of the pair to produce an intra-layer output, and another operation can be performed on that intra-layer output using the second matrix of the pair to produce the output from the layer.
Acoustic models have been adapted to a speaker to form a speaker-dependent adapted model. For example, weights of an adaptive layer of an acoustic model have been modified using training data from a speaker (input audio speech data). The resulting speaker-dependent model can be used for recognizing speech that is indicated to be from that speaker (such as where the speech input is from the same user profile as the speech input used for adapting the speaker-dependent model).
Compression techniques have been used in speech recognition, with such techniques including non-negative compression (where non-negative values are set to zero), and quantization, which is a process of converting a range of input data values into a smaller set of output values that approximates the original data, such as where multiple value ranges are defined and all values within each such range are converted to the same discrete value (e.g., all values between 0.5 and 1.5 are converted to the value 1, etc.).