1. Field of the Invention
The present invention relates to a speech recognition apparatus which can recognize a speech input with high accuracy.
2. Description of the Related Art
A word recognition apparatus which recognizes a word consisting of n syllables or a word consisting of n characters is known, as disclosed in, e.g., Japanese Patent Disclosure (Kokai) No. 59-197974.
The recognition apparatus performs recognition processing of syllables Ai (i=1, 2, . . . , n) of a speech input consisting of n syllables, and obtains similarities (or differences) Sk,i (k and i respectively indicate a syllable name and number) between syllables Ai (i indicates the syllable number) and syllables Bk to be recognized (k=1, 2, . . . , m; k indicates a syllable name). (The syllables to be recognized are all the syllables to be compared with syllables Ai; for example, in Japanese, there are 101 categories as single syllables). The apparatus then stores similarities Sk,i at specific storage positions on a similarity memory defined by syllables Bk to be recognized and their syllable positions i. The similarities stored at positions on the similarity memory defined by syllable codes Ci (i=1, 2, . . . , n) of dictionary words registered in a dictionary memory and syllable positions i of syllable codes Ci in the dictionary words are obtained for syllables of the dictionary words. Thereafter, coincidences between syllables Ai of the speech input and syllables Ci of the dictionary words are computed based on the similarities obtained from the similarity memory, and dictionary words having high coincidences are obtained as recognition candidates of the speech input.
With this speech input recognition method, a speech input can be easily and appropriately recognized at high speed, and can be input as data.
However, as a voiced speech input becomes natural, some of syllables Ai (i=1, 2, . . . , n) of the speech input may be omitted, or one syllable is extracted as a plurality of syllables. For example, if syllable A3 of the speech input is omitted, a recognition series is: EQU A1, A2, A4, A5, . . . , An
If syllable A2 of the speech input is extracted as two syllables A2' and A2", the recognition series is: EQU A1, A2', A2", A4, A5, . . . , An
If such cases occur, an error due to shift in syllable positions after C3 is generated in computations of coincidences for syllable code strings Ci (i=1, 2, 3, . . . , n) of dictionary words, and correct coincidences cannot be obtained. As a result, a speech input cannot be accurately recognized.