The present invention relates to improvements in speech recognizer reference patterns.
As a method of realizing speech recognition which is capable of ready alteration of vocabularies presented for recognition, a method which uses context-dependent phone reference patterns has been extensively utilized. In this method, a reference pattern of a given word presented for recognition can be produced by connecting context-dependent phone reference patterns of corresponding phone expressions. A context-dependent phone reference pattern of each phone (which is designated as a set of three elements, i.e., a preceding phone, the subject phone and a succeeding phone), is produced by making segmentation of a number of pieces of speech data collected for training in phone units, and averaging selectedly collected phones that are in accord inclusive of the preceding and succeeding phones. Such method is described in, for instance, Kai-Fu Lee, IEEE Transactions on Acoustics, Speech, and Signal Processing, 1990, Vol. 38, No. 4, pp. 599-609. In this method, a speech data base that is used for producing a context-dependent phone reference pattern, is provided separately from the speech recognizer, and it is used only when producing the reference pattern.
FIG. 5 shows a case when producing a context-dependent phone reference pattern from speech data corresponding to a phone train "WXYZ" in the speech data base. Referring to FIG. 5, "X (W, Y)" represents a context-dependent phone reference pattern of the phone X with the preceding phone W and the succeeding phone Y. When identical context-dependent phones appear in different parts of speech data, their average is used as the reference pattern.
In the case where a phone reference pattern is produced by taking the contexts of the preceding and succeeding phone into consideration by the prior art method, including the case shown in FIG. 5, even if there exist speech data in the speech data base that contain the same context as the phone in a word presented for recognition inclusive of the preceding and succeeding two phones, they are not utilized at all for recognition. In other words, in the prior art method, a reference pattern is produced on the basis of phone contexts which are fixed when the training is made. In addition, the phone contexts to be considered are often of one preceding phone and one succeeding phone in order to avoid explosive increase of the number of combinations of phones. For this reason, the collected speech data bases are not effectively utilized, and it has been impossible to improve the accuracy of recognition.