A deep neural network (DNN) is known to be a powerful discriminative modeling tool, and can be used for a variety of purposes. For example, a DNN can be combined with a hidden Markov model (HMM) to characterize context-dependent (CD) phones as the pronunciation units of speech. The resulting hybrid CD-DNN-HMM takes advantage of the temporally localized discriminative modeling power of a DNN and the sequential modeling power of an HMM. A CD-DNN-HMM can be used in speech recognition systems, handwriting recognition systems, and human activity recognition/detection systems including gesture recognition systems, among many others.
One of the key procedures in building such CD-DNN-HMMs is the training of the DNN. This training is typically conducted by first initializing the weights and is known as a “pretraining” procedure.