The present invention relates to a speech recognition system and, more particularly, to an apparatus for satisfactorily recognizing speech which is featured by versatile and fuzzy characteristics.
The speech recognition apparatus of the prior art adopts a system, in which standard patterns are prepared for all the categories of speech to be recognized so that an input pattern having the most similarity may be used as the recognized result by referring it to each of the standard patterns, IEEE, Trans. on ASSP - 23, No. 1 (1975) pp. 67 to 72.
In this system, the recognizing operations are based upon the reference to the standard pattern. Since, however, features of speech are implicitly incorporated into the standard pattern, the propriety of the intermediate procedures of the recognitions cannot be decided by a person. Thus, the improvements in the performance of the recognition apparatus are obtained only by a trial and error method so that knowledge cannot be accumulated to improve the performance systematically.
In order to solve this problem, the present applicant has proposed, in U.S. Pat. application Ser. No. 129,994 (filed on Dec. 8, 1987 now abandoned), a system which comprises: means for holding for each of the features intrinsic to the individual phonemes the names and procedures of processings for examining whether or not the features intrinsic to the individual phonemes exist in the time series of the feature pattern; and a table written with both the names of said processings for discriminating two of the combinations of all the categories of the speech to be recognized and the manners of interpreting the processed results so that the recognition processings may be accomplished by the pair discrimination according to the descriptions of said table. In order to determine the causes for a mistaken recognition, according to this method, the mistaken one of the pair discrimination results may be examined. The performance can be improved without adversely affecting the remaining pair discrimination results by improving those processing procedures.
However, this system fails to give adequate consideration to the fuzziness intrinsic in speech (e.g., the uncertainty on the operations of a sound-making organ, the fuzziness in the speaking attitude of a speaker or the fuzziness due to the structure deformation in the intonations) or the technical restrictions such as the shortage of the analytical resolution. In order to constitute the most proper pair discrimination for all the two combinations from the total phonemic categories their characteristics have to be analyzed one by one to determine the processing method so that a long time has to be taken for the development of the processing.
In the meanwhile, there has been proposed a speech recognition system which incorporates fuzzy logic so that it is suited for the fuzzy information, as is disclosed in D, Vol. J70-D, No. 10, pp. 1890 to 1901 (1987. 10) and R. De. Mori et al. "Use of Fuzzy Algorithms for Phonetic and Phonemic Labeling of Continuous Speech" IEEE Trans. on PAMI, Vol PAMI-2, No. 2, pp. 136-148 (1980). This system is based upon tree search so that each of various logic discriminations are not independent. The remaining problem is that the aforementioned processings are not independent so that the performance improvements are difficult to accumulate.