1. Field of the Invention
The present invention relates to a speech recognition system including a first word recognizing section based on matching in units of words and a second word recognizing section based on matching in units of word constituent elements.
2. Description of the Related Art
The following two methods have been conventionally known as standard methods of word recognition. The first word recognition method is based on matching in units of words, i.e., word-based matching. In this method, input speech is extracted in units of words, and word recognition is performed by word pattern matching of the entire words. The second word recognition method is based on matching in units of word constituent elements. In this method, input speech is segmented in units of word constituent elements such as phonemes or syllables, and phoneme or syllable recognition is performed by pattern matching in units of the word constituent elements, thereby recognizing a word input speech from a series of the recognized word constituent element candidates.
In the former system, since matching of entire words is performed, input speech need not be decomposed into smaller recognition units as in the latter method. Hence, the former recognition method is simple. In addition, since matching of entire words is based on dynamic time-frequency spectral information of utterance, the recognition accuracy is generally high. On the other hand, in the former system, when the number of words to be recognized is increased, registration of the reference patterns for all the words becomes difficult, and hence the recognizable vocabulary size is undesirably limited. Furthermore, since learning of reference patterns requires a large amount of word speech patterns, the vocabulary cannot be easily changed or increased.
In contrast to this, in the latter system, since the number of phonemes or syllables is much smaller than that of words, the number of types of reference patterns to be prepared is at most a hundred. In addition, the vocabulary can be changed by a simple method of, e.g., entering a character string. In this method, however, recognition processing consists of the following steps, e.g., segmentation, labelling (phoneme or syllable recognition) and word recognition, and hence the processing is complex. In addition, errors in segmentation, in conversion of a pattern into a phoneme or syllable series, or in estimation of a word from the phoneme or syllable series degrade the recognition rate.
In addition to the above-described two methods, a method of performing word recognition by using both pattern matching of entire words and a network of a series of labels attached to the respective frames of a speech pattern has been proposed (Proc. Seventh ICPR pp. 1232-1235, 1984). The above-described problems of difficulties encountered in registration of reference patterns and expansion of a vocabulary still remain unsolved in this method.
As described above, in the conventional speech recognition systems, if the word recognition method based on matching in units of words is used, registration of reference patterns or expansion of a vocabulary is laborious. If the word recognition method based on matching in units of word constituent elements is used, a processing amount is greatly increased, and a recognition error tends to occur.