The present invention relates to a phoneme similarity calculating apparatus for calculating a similarity (which herein includes both a similarity in a narrow sense which is increased as a feature parameter resembles a corresponding phoneme more closely, and a distance which is decreased as the parameter resembles a corresponding phoneme more closely) of a feature parameter of input speech in each frame of a predetermined time to each phoneme in a speech recognition apparatus or a speech learning apparatus.
Upon recognition of continuous speech, since an object to be recognized is input in units of words or sentences, it is important how a phoneme structure of input speech is determined. For this purpose, a phoneme similarity calculation is necessary.
Since the sound produced as the human voice is continuously changed, a feature parameter extracted from a speech signal is continuously changed over time.
In a conventional phoneme recognition method, reference phoneme patterns or statistic identification functions are prepared for every phoneme class, and input speech is matched with these patterns or functions for each frame to calculate a phoneme similarity. Note that the phoneme class means a classification unit of phonemes such as k, f, and the like.
Although phonemes include steady phonemes, such as vowels or nasal sounds which are not significantly changed, and unsteady phonemes, such as plosives which are abruptly changed, a phoneme similarity is calculated regardless of steadiness and unsteadiness in each frame of input speech in the above-mentioned matching. Therefore, a similarity to another phoneme class may be accidentally increased for a so-called "glide" segment between adjacent phonemes in the input speech, and a phoneme may be erroneously recognized.
In the conventional phoneme similarity calculating apparatus, since a similarity is calculated using a reference pattern regardless of the steadiness and unsteadiness of the phoneme class, an incorrect similarity is often calculated.