1. Field of the Invention
The present invention relates to a speech processing apparatus and method. In particular, embodiments of the present invention are applicable to speech recognition.
2. Description of Related Art
Speech recognition is a process by which an unknown speech utterance is identified. There are several different types of speech recognition systems currently available which can be categorised in several ways. For example, some systems are speaker dependent, whereas others are speaker independent. Some systems operate for a large vocabulary of words (>10,000 words) while others only operate with a limited sized vocabulary (<1000 words). Some systems can only recognise isolated words whereas others can recognise phrases comprising a series of connected words.
In a limited vocabulary system, speech recognition is performed by comparing features of an unknown utterance with features of known words which are stored in a database. The features of the known words are determined during a training session in which one or more samples of the known words are used to generate reference patterns therefor. The reference patterns may be acoustic templates of the modelled speech or statistical models, such as Hidden Markov Models.
To recognise the unknown utterance, the speech recognition apparatus extracts a pattern (or features) from the utterance and compares it against each reference pattern stored in the database. A scoring technique is used to provide a measure of how well each reference pattern, or each combination of reference patterns, matches the pattern extracted from the input utterance. The unknown utterance is then recognised as the word(s) associated with the reference pattern(s) which most closely match the unknown utterance.
In limited vocabulary speech recognition systems, any detected utterance is usually matched to the closest corresponding word model within the system. A problem with such systems arises because out-of-vocabulary words and environmental noise can be accidentally matched to a word within the system's vocabulary.
One method of detecting accidental matches used by prior art systems is to provide a language model which enables the likelihood that detected words would follow each other to be determined. Where words are detected that are unlikely to follow each other, the language model can then identify that at least one of the detected words will probably have been incorrectly identified.
An alternative method of detecting accidental recognition is to generate a measure of how well a detected utterance matches the closest word model as is disclosed in for example U.S. Pat. No. 5559925, U.S. Pat. No. 5613037, U.S. Pat. No. 5710864, U.S. Pat. No. 5737489 and U.S. Pat. No. 5842163. This measure or confidence score is then used to help the system recognise accidental matches. However, the correlation between generated confidence scores in the prior art and the likelihood that an utterance has been mismatched can be unsatisfactory.
There is therefore a need for apparatus and method which can generate a better measure of the likelihood that an utterance has been mismatched. Furthermore, there is a need for a speech recognition system in which a generated score that the likelihood that an utterance has been mismatched can be combined with other means of detecting mismatched utterances such as that provided by language models so that the reliability of speech recognition systems can be improved.