The invention relates to encoding speech with a series of electrical signals, such as for computerized speech recognition.
In a speech recognition computer system, the values of multiple features of all utterance are measured during each of a series of successive time intervals to produce a series of feature vector electrical signals forming a coded representation of the utterance. The series of feature vector signals is modelled to be generated by hidden Markov models of each of a number of words ill a vocabulary in order to estimate, for each word, the probability that tile utterance of the word would have produced the series of observed feature vector signals.
In one technique (U.S. Pat. No. 5,182,773 issued to Bahl et al and entitled "Speaker-Independent Label Coding Apparatus"), the feature vector signal for a selected time interval is obtained by first forming a spliced vector consisting of the values of all measured features during a number of successive time intervals centered on the selected time interval. The spliced vector is then projected down to a smaller number of dimensions to produce the feature vector signal.
For a computer system which measures twenty-one features each time interval, which forms a spliced vector from the values of all twenty-one measured features during nine successive time intervals to form a 189-dimension spliced vector, and which projects the spliced vector down to 50 dimensions, the computer system must provide computer memory to store 50 projection vectors of 189 dimensions each. Moreover, the computer processor must perform 9,450 multiply and add operations to obtain each feature vector.
Producing feature vector signals in this way consumes significant resources. While the computer memory required and the computer processor operations consumed can be decreased by measuring fewer features, or by splicing the features over fewer successive time intervals, it has been found that these modifications cause the quality of speech coding to fall, and the accuracy of the speech recognition based on such speech coding to decrease.