1. Technical Field
Embodiments relate to large vocabulary continuous speech recognition (LVCSR) technology, and more particularly, to an apparatus and method to improve performance of LVCSR technology based on a context-dependent deep neural network hidden Markov model (CD-DNN-HMM) algorithm.
2. Description of the Related Art
To implement a large vocabulary continuous speech recognition (LVCSR) system based on a context-dependent deep neural network hidden Markov model (CD-DNN-HMM) algorithm, hidden Markov model (HMM)-state level information may be necessary. In general, to obtain the HMM-state level information, a speech recognition system based on a Gaussian mixture model HMM (GMM-HMM) algorithm may be used. Thus, performance of the CD-DNN-HMM algorithm based LVCSR may be greatly affected by performance of the GMM-HMM algorithm based speech recognition.
However, in a case of general GMM-HMM algorithm based speech recognition technology, an accuracy in obtaining the HMM-state level information associated with an input speech signal may not be guaranteed, which may restrict the performance of the CD-DNN-HMM algorithm based LVCSR.
Thus, to provide a more stable and accurate result of the CD-DNN-HMM algorithm based LVCSR, there is a desire for a method of increasing an accuracy in obtaining the HMM-state level information using the GMM-HMM algorithm based speech recognition technology.