1. Field of the Invention
The present invention relates to a speech perception apparatus which recognizes and discriminates speech patterns.
2. Description of the Prior Art
Conventionally, in this kind of apparatuses, a reference speech patterns of a person is preliminarily input. This input speech signal is subjected to a predetermined conversion, then several feature vectors are extracted and combinations of the feature vectors thus obtained are registered as reference feature patterns in a registration file.
The feature vector will now be described. As a quantity representing the voice, there is a time sequence of a frequency spectrum of the speech waveform, or the like. A specific feature of the speech which is extracted from this time sequence is the feature quantity. Since the feature is ordinarily represented by a plurality of elements, it is indicated by a vector. This vector is the feature vector. For example, a feature is indicated by a ratio between the energies of speech in the high and low frequency bands, which energies vary with the elapse of time. On one hand, a feature of which the data which become a reference with respect to a voice of, e.g., "A" uttered by a person was converted to the feature vector is assumed to be a reference feature pattern of this person.
In addition, an input signal of character, figure or symbol from a keyboard or the like is made coordinate with a combination of the feature vector derived and is registered as a reference feature pattern in the registration file if necessary.
Next, when the apparatus recognizes an ordinary speech, the feature vector is extracted due to a similar conversion as mentioned above and compares it with the reference feature pattern in the registration file to calculate similarity. The apparatus then selects the feature vector which is most analogous to the input speech from among the reference feature patterns, thereby recognizing the speech. In the calculation of similarity, for instance, the distance between the feature vector and the reference vector is obtained and it is assumed that the similarity is high as the distance becomes small. On the other hand, in the case where there are a plurality of reference feature patterns regarding one speech and they are distributed, the distance from the center of the distribution may be obtained or the speech may be examined to see if it exists in the distribution or not.
Therefore, the person who utters believes that the uniformity or reproductivity of the utterance of himself is correct for the interval when, for instance, the display character of which the result of perception was coded represents the sound of utterance; however, he knows for the first time that the utterance; of himself deviated from the reference pattern which has already been registered only when the display character differs from the utterance. However, human speeches also largely depend upon various human conditions (condition of the throat and the like). Therefore, for example, when the apparatus erroneously recognizes the speech, this often means that the speech could not be recognized since it had gradually deviated from the speech registered or that the speech changed as largely as it is erroneously recognized. Further, time has already elapsed since the speech was erroneously recognized, so that it is now difficult for the person who utters to remember the feeling of the speech at the time of registration. It is therefore often necessary to register the speeches again.