The present disclosure relates to acoustic modeling, and more particularly to speaker adaptation of deep neural networks (DNNs).
Given the popularity of DNNs for acoustic modeling, speaker adaptation of DNNs is an active area of research. However, the portability of transform-based approaches like Maximum-Likelihood Linear Regression (MLLR) that work well for Gaussian mixture models to DNNs is not straightforward. Unlike Gaussian means or variances that can be transformed together if they belong to the same acoustic class (phones, Hidden Markov model (HMM) states or clustered versions thereof), it can be difficult to find structure in the weights of a neural network.