The invention relates to a method of recognizing a sequence of words of a given vocabulary from a speech signal, comprising the following steps:
sampling of the speech signal at recurrent instants in order to produce a series of test signals;
executing a signal-by-signal comparison between the test signals and various series of reference signals, thus producing scores, each series of reference signals forming part of a predetermined set of series of reference signals and representing a word of the vocabulary, the set of series of reference signals constituting a tree which has a root and in which each tree branch comprises a number of reference signals and is associated with a speech element, vocabulary words being associated with given branch nodes and branch ends;
deriving, for each terminated word, a word result which comprises an overall score, the comparison producing scores being continued, starting with a new score corresponding to the overall score of the terminated word as the predecessor word, for subsequent test signals with the start of series of reference signals for as long as the score is smaller than a predetermined threshold value, the overall score of the terminated word being derived from the score at the end of this word and a language model value which is associated with a combination of the terminated word and a sequence of predetermined length of terminated predecessor words;
recognizing of at least one sequence of words on the basis of the overall scores.
The invention also relates to a device for carrying out this method.
A method of this kind is known from EP 0 533 260 and serves for the recognition of coherently pronounced speech with a large vocabulary. Because the comparisons with the root of the vocabulary tree start anew each time when a word end is reached, very many states are active within the tree. This will become evident when each comparison newly starting at the tree root is represented as a copy of the vocabulary tree. This holds the more so when separate tree copies are composed for simultaneously terminating words when complex language models are used. Each distinct path through each tree then represents an instantaneous hypothesis. In order to reduce the number of active hypotheses, the scores of the hypotheses are compared with a threshold value which is preferably formed by the optimum score at the current instant, increased by a given range of values, and all hypotheses having a score exceeding said threshold value are not continued.
For a further reduction of the search space, i.e. the number of active hypotheses, moreover, use is made of a language model which takes into account the probabilities of word sequences or at least the individual probabilities of the words per se. However, when the vocabulary is composed as a tree, the language model value can be determined only after termination of a word. The continuation of a hypothesis after a word end then takes place with an abruptly increased score, so that the threshold value with which the scores are compared must be chosen so as to be sufficiently high. As a result, however, many hypotheses remain active within the individual words or the tree copies; these hypotheses exceed the relevant valid threshold value only after a word end has been reached, due to addition of the language model value, so that they are terminated.