1. Field of the Invention
The present invention relates to a method and a apparatus for accurately recognizing speech, and more particularly to a speech recognition method and apparatus having excellent recognition accuracy even under noisy environments.
2. Description of the Related Art
Many research efforts as to speech recognition have been made to improve system functions as well as effectively enter any signal into information equipments and communication equipments. A method of pattern matching is known as an ordinary method to effect the speech recognition.
A prior method of speech recognition will be described below with reference to FIG. 1.
An input speech (S1) signal is converted into a time series pattern (hereinafter referred to as a speech pattern) of vectors indicative of features in frequency analysis (S2) thereof. The speech pattern is yielded by sampling intra-band frequency components for every time interval T (8 millisecond, for example, hereinafter referred to as speech frame), the intra-band frequency component being extracted through a group of P bandpass filters having different center frequencies. In addition, speech power of the speech pattern for each speech frame interval is also evaluated in this frequency analysis (S2). The speech pattern yielded (extracted) through the frequency analysis (S2) is stored in succession in an input memory (S5) during the succeeding speech pattern storage processing (S3). While, a voiced interval, i.e., a start point and an end point of a speech is determined based on the speech power evaluated through the frequency analysis (S2) in speech interval detection processing (S4). For algorithms to determine a voiced interval with use of the speech power, there is known for example a simple algorithm taking as a start point of a speech a time point of speech power getting more than a certain threshold and as an end point of the speech a time point of the speech power getting less than the threshold or there are known another general algorithms. A paper one thereamong was employed for detection of the voiced interval. The speech pattern within the voiced interval determined therough the speech interval detection processing (S4) is read from the input memory (S5), while a reference pattern is read from a reference pattern memory (S6). Then, in similarity evaluation processing (S7), similarity between the speech pattern and the reference is estimated by making use of a dynamic programming matching method and a linear matching method, etc. The reference pattern described here is a time series pattern of vectors subjected to the same speech analysis as in the speech pattern with respect to a word (hereinafter referred to as a category) being a recognition object, and is previously stored in the reference pattern memory (S6). In the subsequent judgement processing (S8), the similarlity between each reference pattern evaluated by the similarlity evaluation processing (S7) is compared and a name given to a reference pattern indicative of a maximum similarlity is determined as a recogntion result (S9). The prior speech recognition method described above was adapted to estimate a difference between the speech pattern indicative of a spectrum of the speech signal and the reference pattern previously evaluated by the same spectral analysis using the similarlity described above, and thereby adopt a name of the reference pattern showing the maximum similarity as a recognition result. Accordingly, when input speech and reference patterns are the same word, the similarity therebetween is increased, but when they are different the similarity is reduced. If, however, the spectrum of a speech pattern is distorted due to factors other than the speech, for example external noises, similarity between a speech pattern and a reference pattern is reduced even if both are the same words, and hence it is impossible to yield a correct recognition result. Furthermore, such a prior recognition method requires much time for the arithmetic operations and a bulk memory storage, and is thus likely to result in a large-size structure device for implementation.