1. Field of the Invention
This invention generally relates to a speech-recognition method and apparatus and, more particularly, to such a method and apparatus capable of recognizing particular phonemes in a voice signal regardless of the speaker.
2. Description of the Prior Art
Known speech-recognition apparatus can recognize phonemes uttered by a particular speaker. In using that type of apparatus the speaker utters a list of all words to be recognized and acoustic parameters of the words are detected by various circuit elements, such as a band-pass filter bank, and stored in a memory. Then, when that speaker later uses the same words in normal speech, their acoustic parameters are detected, compared with the previously stored acoustic parameters and, when the acoustic parameters of both coincide, the apparatus "recognizes" the later-spoken words. To cope with a situation in which the speaker might talk faster or slower at different times (for example, the speaker might talk slower when listing the words than in normal speech) a time series of the acoustic parameters can be extracted at regular intervals, for example every 5 to 20 msec, and used in recognizing the words.
The foregoing type of apparatus must register and store in advance all acoustic parameters of all words to be recognized, and thus requires enormous storage capacity and must perform a great many mathematical calculations. The "time matching" function, for example, requires myriad mathematical calculations and taxes the abilities of most data processors. If the time bases are not sufficiently matched, recognition might be faulty.
Another voice-recognition method has been proposed which is capable of recognizing individual phonemes, for example, the sounds A, I, U, E, 0, K, S, T, etc., and the syllables KA, KI, KU, etc.
A principal drawback of the last mentioned method is that, while phonemes such as vowels and the like with quasi-stationary portions can be easily recognized, phonemes with short phonemic characteristics, such as plosives (K, T, P and so on), are extremely difficult to organize into phonemes using acoustic parameters.
To overcome that difficulty, a refinement of the method has been proposed that involves storing the phonemes that are discretely uttered. The phonemes that are diffusively uttered are recognized by matching their time bases using "time matching" techniques similar to those described above, whereby the phonemes with short phonemic characteristics such as the aforesaid plosives (K, T, P and so on), can be more readily recognized. However, that method also has limited utility because of the large number of mathematical calculations required to match time bases. Furthermore, when that method is used to recognize phonemes of anyone, rather than just a particular speaker, the properties of the acoustic parameters are so scattered due to individual differences in speech that the recognition of phonemes is virtually impossible merely by matching the time bases as described above.
Accordingly, still other methods have been proposed. One such other method stores a plurality of acoustic parameters that could represent a word and then recognizes phonemes on the basis of approximate matches of those acoustic parameters. Another method converts a whole word to parameters of fixed dimensions and then evaluates or discriminates among them using a discriminatory function. But, those methods, like the others earlier mentioned, require large amounts of storage capacity and great numbers of mathematical calculations, which reduces considerably the number of words that can be recognized.
One property of voice signals is the existence in them of transitions--the points at which one phoneme changes to another and at which a silence becomes a phoneme or vice versa. Methods of detecting those transitions are known, but no known prior art method or apparatus has been proposed for effectively and efficiently using the transitions for speech recognition.