1. Field of the Invention
The present invention relates to an apparatus and method for recognizing speech, and more particularly, to a speech recognition apparatus and method based on a deep-neural-network (DNN) sound model.
2. Discussion of Related Art
A context-dependent deep-neural-network (DNN)-hidden-Markov-Model (HMM) technique using a combination of a DNN and an HMM has been actively applied to sound models for speech recognition by replacing an existing CD-Gaussian-mixture-model-HMM (CD-GMM-HMM) (hereinafter referred to as ‘GMM-HMM’) technique.
A DNN-HMM technique according to the related art is performed as will be described below.
First, a state of an HMM corresponding to an output node or a target of a DNN structure is determined through a process of learning an HMM, and state-level alignment information of training speech data is extracted.
A process of learning a DNN may be a process of receiving information regarding the state of the HMM determined on the basis of a result of learning the HMM and the state-level alignment information of the training speech data, and obtaining characteristics of a shape which is most distinctive in terms of pattern recognition and model parameters.
In this case, the state-level alignment information may be obtained through a method capable of iterative learning after state-level realignment by including this information in the process of learning a DNN. However, in a DNN learning technique according to the related art, state-level alignment information is determined beforehand and thus an output node of a DNN structure cannot be changed.
Meanwhile, a state of an HMM for recognizing large-vocabulary speech is generally determined according to a decision tree-based method. However, it is inefficient to determine a state of large-size training speech data having different acoustic-statistical characteristics (e.g., a sound model for recognizing English speech of multiple native speakers who can speak, for example, Chinese, Korean, and English) using one decision tree.
A DNN-HMM structured sound model employing DNN structure-based machine learning has a very high discrimination performance and thus has recently been used in the field of pattern recognition including speech recognition by replacing an existing GMM-HMM-based sound model.
However, a DNN-HMM-based learning technique according to the related art is a method of learning a structure having characteristics and parameters that most appropriately discriminate predetermined states and thus is not applicable in the field of speech recognition application performed with respect to multiple native speakers having different acoustic-statistical characteristics.
In this regard, Korean laid-open patent publication No. 10-2006-0133610 entitled “Heart Sound Classification Method Using Hidden-Markov-Model” discloses a heart sound classification method of modeling an HMM using heart sound data and recognizing the modeled HMM.