This invention relates to apparatus for automatically recognizing the contents of speech.
A conventional speech recognition apparatus is described in a paper resulting from a large project pattern information processing system study entitled "Speech Recognition System." This is a practical apparatus for recognizing ten or more words of speech transmitted on a telephone line.
A functional block diagram of a computer employed in a conventional manner for speech recognition is shown in FIG. 1. Speech analysis is effected by a speech recognizing portion 1, responsive to a speech input signal. The system includes analyzer 1 having an adaptive equalizer comprising a first order filter with self-correlation analysis for compressing 20.sup.th order analyzing parameters to 7.sup.th order feature vectors through K-L conversion. Then, speech period is detected by a speech period detection portion 2 using speech amplitude information. Then time compression is effected by a time compressing portion 3. This is effected for normalizing the variations in speech speed. To this end, an extracting procedure capable of extracting an instant which represents the structure of phoneme of a word is prepared in advance in a word table 4. With reference to such information given for each word, stable extracting points are obtained by determining extracting points. The fifteen extracting points are used in all the words.
Then a projection-image-on-convex portion 5 projects an image of time compressed patterns. To improve separation between categories, i.e. types of words, pattern vectors obtained from the time compression portion 3 are projected on a convex surface. Then distinction functions are computed for different categories by a distinction function computing portion 8. Distinction function data are derived in portion 7 beforehand by computing a segmentary linear distinction function using a number of speech sound samples by a segmentary linear distinction function value computing portion 6 after the operation of the projection-image-on-convex portion 5. The above-mentioned distinction function value computation is effected through multi-stage processing. This is a method of obtaining a distinction function which accurately and finally separates samples after several stages of computation of distinction functions have been repeated using a small number of samples. This method is adopted because the repeating process of linear planning remarkably increases when the matrix becomes very large. Assuming that a segmentary distinction function obtained at each stage is expressed by Cijl(X), a distinction function with respect to class l of a pattern vector X is given by: ##EQU1## wherein i indicates a piece of a distinction function; and j indicates a stage.
The result of the distinction function computing portion 8 is fed to portion 9 where a word determination is made and an output derived.
The above-described conventional method suffers from the following problems.
(1) Since the number of extracting points through time compression is common to all the categories, extracting points which do not contribute to the determination are included, resulting in deterioration of sharpness of the determination. In addition, the conventional method is affected by time base expansion and compression to a considerable degree.
(2) A large amount of computation is necessary for deriving the projection image for a convex and distinction function.
(3) Since the feature vectors are obtained through self correlation analysis without using statistical distance measure, the distinction functions may be mismatched due to slight phoneme differences as a function of speakers.
(4) Misrecognition is apt to occur due to errors in speech period detection caused by noise or telephone line variations.
(5) Since speech period detection results from a combination of rules, there are complex combinations to improve accuracy.