1. Field of the Invention
The present invention relates to an apparatus and method for speech recognition using a plurality of confidence score estimation algorithms and, more particularly, to an apparatus and method for speech recognition using a plurality of confidence score estimation algorithms, in which a score based on a likelihood ratio and a score based on a Gaussian distribution trace are used to set a standard of judgment as to confidence of the result of speech recognition, and an input speech is recognized in accordance with the set standard of judgment to determine the confidence of the result of speech recognition.
2. Description of Related Art
Speech recognition is a series of processes that extracting phonemic and linguistic information from acoustic information included in speech, recognizing the extracted information, and responding to the recognized information. Speech recognition is achieved via speech recognition algorithms. Examples of speech recognition algorithms include a dynamic time warping algorithm, a neural network algorithm, and a hidden Markov model algorithm.
With the recent increase of studies of speech recognition, speech control is being increasingly used in industry. A speech recognition system, which controls electronics at home, such as a home network system and a home automation system, includes a voice user interface (VUI). To effect speech control via such a speech recognition system, it is necessary to detect a keyword required to control electronic home appliances through a natural language in the home environment. And, keyword detection performance increases if confidence estimation of the keyword is accurate.
Human language is not perfectly coincident with a specified format or a specified phoneme. Rather, human language is based on a natural language, and can vary. For this reason, it is important to detect the keyword from recognized speech.
Japanese Patent Unexamined Publication No. 7-056594 discloses a speech recognition system that extracts a multi-dimensional discrete feature vector of input speech using a feature extractor, converts the input speech into a phoneme identification score using the extracted vector, compares a reference pattern of each word to be recognized with a previously stored reference pattern using a dictionary having a stored phonemic label and a dynamic programming technique to obtain a maximum matching score. However, in this speech recognition system a more exact confidence score cannot be obtained because the score fails to reflect variation of a temporal axis of each pattern.
Accordingly, there is a need for a method for calculating confidence score by reflecting variation of a temporal axis of an input speech signal is required.