A trained deep neural network (DNN) is known to be a powerful discriminative modeling tool, and can be used for a variety of purposes. For example, a DNN can be combined with a hidden Markov model (HMM) to characterize context-dependent (CD) phones as pronunciation units of speech. The resulting hybrid CD-DNN-HMM takes advantage of the temporally localized discriminative modeling power of a DNN and the sequential modeling power of a HMM. A CD-DNN-HMM can be used in speech recognition systems, handwriting recognition systems, and human activity recognition/detection systems, among many others.
One of the key procedures in building such CD-DNN-HMMs is the training of the DNN. DNNs are computationally demanding to train because of the large number of parameters involved and because much of the computation is shared across states which cannot be done on demand. Only recently has training DNNs become feasible owing to easy access to high-speed general purpose graphical processing units (GPGPUs), and the development of effective DNN layer weight initialization techniques.