The invention relates to a speech recognition method and more particularly to a method of the type in which sentences that have been put together from words of a given vocabulary are recognized, wherein a limited number of permissible sentences and an N-gram speech model into which the syntax of the permissible sentences is integrated are predetermined.
In the recognition of connected speech, which permits any combination of all words, the error rate increases considerably compared to the recognition of individual words. To counteract this, knowledge on permissible word sequences, for example, can be stored in so-called speech models and can be used in the recognition. As a result, the number of permissible sentences can be limited considerably.
Usually, speech models are defined as N-gram models, with N being identified as the depth of the model and indicating the number of words following one another within a word sequence which are considered in the actual evaluation of a word sequence hypothesis. The recognition process becomes rapidly more complex as N increases; therefore, the particularly simple bigram model with N=2 is preferred which only considers combinations of two words. The speech models can be further simplified if words, which occur in the same context but which do not necessarily have to have the same meaning, are combined in word groups (e. g., all weekdays). Instead of individual word transitions, the speech models can consider the transition from one word group to another.
In Informatik Forsch. Entw. [Informatics Research Development] (1992) 7, p. 83-97, basic problems of the automatic recognition of flowing language are dealt with in detail and approaches for solving problems are described from the point of view of statistical decision theory. The focus is on the stochastic modelling of knowledge sources for acoustics and linguistics, e. g., in the form of phoneme models, pronunciation dictionary and speech model.
From "The HARPY Speach Understanding System" in Readings in Speech Recognition, 1990, Morgan Kaufmann Publishers Inc., a speech recognition system is known which has a greatly limited number of permissible sentences. The syntactic and semantic constraints determining permissibility can be formulated in grammar equations and can be represented as a graph. A few simplifications are introduced in order to get from the grammar definition, which is complete but involves great processing complexity, to a compact speech model with reasonable processing complexity.
But sometimes such simplifications are only possible if it is accepted for the speech model that nonpermissible word sequences in the original grammar definition appear as being permissible again. Finally, in the HARPY system, the words are replaced by their phonetic definitions and, in this manner, a phonetic model for a complete sentence recognizer is created.