1. Field of the Invention
This invention relates to a method of speech recognition.
2. Description of the Prior Art
Japanese published unexamined patent application 62-111293 discloses a method of speech recognition which is designed to maintain accurate recognition of speech with background noise.
According to the prior art method of speech recognition which is disclosed in Japanese application 62-111293, an input signal interval is set equal to a sufficiently long period during which speech to be recognized, noise preceding the speech, and noise following the speech occur. A temporal reference point is provided in the input signal interval. Proposed speech intervals are provided which start from the reference point and which are sequentially offset by 1-frame lengths. The shortest proposed speech interval has N.sub.1 frames. The longest proposed speech interval has N.sub.2 frames. The total number of the proposed speech intervals is equal to N.sub.2 -N.sub.1 +1. The input signals in the proposed speech intervals are collated with standard patterns of recognized objects while the proposed speech intervals are expanded and contracted to a fixed time length. This collation provides the similarities or distances related to the respective recognized objects. Such collation is reiterated while the reference point is moved from the start point to the end point of the input signal interval. Consequently, the similarities related to the respective recognized objects are determined for all the proposed speech intervals and all the different reference points. The recognized object related to the maximum of the similarities is outputted as a recognition result.
The prior art method of Japanese application 62-111293 dispenses with a step of detecting the interval of speech, and uses a word spotting technique which is effective to process speech generated in environment containing noise. Specifically, a signal representing such speech is cut out of or spotted from a sufficiently long period during which the speech, noise preceding the speech, and noise following the speech occur. Speech recognition is done on the basis of the cut-out signal.
In the prior art method of ,Japanese application 62-111293, a feature-parameter temporal sequence is obtained by analyzing an input signal which contains components representing speech, noise preceding the speech, and noise following the speech. The feature-parameter temporal sequence is collated with standard patterns of recognized objects for all partial input signal intervals. This collation provides the similarities related to the respective recognized objects. The partial input signal interval corresponding to the highest similarity is cut out or spotted for each of the standard patterns. The recognized object related to the maximum of the similarities is outputted as a recognition result.
A description will now be given of processing input speech representing "juuichi" (a Japanese word written in Roman characters which means the numeral 11). In the prior art method of Japanese application 62-111293, a correct partial input signal interval can usually be cut out during the collation between the input speech "juuichi" and a standard pattern corresponding to "juuichi", but it is sometimes difficult to cut out a correct partial input signal interval during the collation between the input speech "juuichi" and a standard pattern "ichi" (another Japanese word written in Roman characters which means the numeral 1). This difficulty seems to be caused by the fact that "ichi" is a part of "juuichi". As a result, in some cases, the calculated similarity related to the standard pattern "ichi" is higher than the similarity related to the standard pattern "juuichi", and wrong recognition is done. Experiments using computer simulation revealed that 27.5% of the results of recognition of input speech "juuichi" generated from 80 different speakers were wrong as being judged to be speech "ichi".