The invention relates to a method of determining a sequence of words of a predetermined vocabulary in a speech signal, comprising the steps of: sampling the speech signal at recurrent instants so as to produce a sequence of test signals;
Signal-wise comparing the test signals with different sequences of reference signals while generating scores, each sequence of reference signals representing a word of the vocabulary, the comparison always commencing anew with subsequent test signals, starting from the beginning of the different sequences of reference signals; deriving, for each word end reached, a word result which comprises at least a reference to the word beginning of the ended word, a word score, and a reference to the ended word; deriving at least one sequence of words from the word results. The invention also relates to a device for carrying out the method.
A method of this kind is known from EP 0 285 211 A2. Therein, branched word strings are formed, the number of branches being limited, notably in the case of a large vocabulary and long sentences, in that the scores are regularly compared with a threshold value and sequences in which the score exceeds the threshold value are not continued. The number of branches are reduced further by using a language model for the comparison, i.e. for the decision as to with which sequences comparisons are to be continued after ended words. The comparisons, always commencing anew, are not carried out in a mutually independent fashion, but represent, like the comparisons within the sequences of reference signals, parts of sentence hypotheses which can also be recombined so that from comparisons commenced at different, usually closely spaced instants or test signals, only one remains so as to lead to the word end. From the branches remaining at the end of the speech signal, the branch having the best score at the end is traced back, and the words of this branch are output from the beginning.
Notably for taking into account the language model this method requires a major processing effort and the use of the threshold value and the language model may be a cause of failure in determining the actually spoken word sequence from the speech signal because of premature and undue termination of the relevant branch.
From U.S. Pat. No. 4,624,008 there is known a speech recognition device which utilizes a language model in the form of a strictly predetermined syntaxis. For each word end it is checked whether the ended word is compatible with the preceding words in conformity with the syntaxis, and subsequently it is determined which words may succeed, the comparison being continued only with these words. Furthermore, it is continuously checked whether a sentence end in conformity with the predetermined syntaxis has been reached. Thus, continuous use is again made of a language model in order to limit the number of hypotheses arising during the comparisons.