1. Field of the Invention
The present invention relates to a method and a device for voice recognition.
2. Description of the Related Art
A voice recognition system is taught in the reference A. Hauenstein, “Optimierung von Algorithmen und Entwurf eines Prozessors für die automatishce Spracherkennung” This also contains a basic introduction to the components included in the voice recognition system, as well as important techniques that are common in voice recognition.
In a known voice recognition system, a degree of accuracy—that is, a measure of a quality of the recognition—is predetermined. The user must now make do with this system, even when a lower degree of accuracy would suffice for his application, though he would achieve a higher operating speed in the bargain.
The principle of pruning a search space is known from the reference A. Hauenstein, “Optimierung von Algorithmen und Entwurf eines Prozessors für die automatishce Spracherkennung” (see chapter 3.3.3, page 40). This is a matter of “trimming” the search space, or rather a method for reducing a number of search paths of the search space, whereby the least promising search paths are cut off. To this end, first a search path with minimal costs (optimal search path) is established. Then, all search paths (branches of the search tree) whose costs are above the minimum inclusive of an added prescribed evaluation quantity, which is referred to as the pruning threshold, are cut off. For a detailed explanation of the pruning: the reference A. Hauenstein, “Optimierung von Algorithmen und Entwurf eines Prozessors für die automatishce Spracherkennung”, p. 40ff; particularly FIG. 16 on page 41. When a pruning threshold is used, it is not known how many search paths will remain in the search tree. If one wishes to maintain the number of these remaining search paths at a predetermined level, the pruning threshold is dynamically adapted.
A histogram pruning is taught in the references V. Steinbiss et al., “Improvements in Beam Search” and M. Niemöller et al., “A PC-based Real-Time Large Vocabulary Continuous Speech Recognizer for German”. Here, a predetermined number of “best” search paths are used—that is, search paths with a high probability of occurrence—in that frequencies of the search paths are evaluated in the form of a histogram. The pruning threshold is dynamically modified.
An acoustic look-ahead in the search tree (term of art: fast look ahead) is taught in the references A. Hauenstein, “Optimierung von Algorithmen und Entwurf eines Prozessors für die automatishce Spracherkennung” and S. Ortmanns et al., “Look-Ahead Techniques for Fast Beam Search”.
The idea pursued in the acoustic look-ahead (also referred to as fast preselection) is based on the characteristic of a language that all words are composed of a limited inventory of linguistic subunits (e.g. phonemes, half-syllables). An acoustic weighting is now performed for these linguistic subunits “in advance”. Only those combinations of linguistic subunits are tracked whose acoustic weights are below a predetermined threshold. An advantage in the weighting outlay is that, for a low number of linguistic subunits, a measure of the agreement of a speech signal that is to be recognized, on one hand, and a target quantity, on the other hand, is computed in advance and used as a basis for deciding whether a large part of the search tree should be excluded from consideration. Simply put, this means that more search paths in the search tree are reserved than are added in by the prediction. A profit like this grows larger the higher the ratio of new word beginnings to a number of linguistic subunits becomes. This ratio increases with the number of linguistic subunits, or respectively, words, that are to be recognized (lexicon size).
An advantage of the method of acoustic look-ahead consists in the regularity of the algorithms for computing the corresponding scores. Since there are no branchings in the search space due to word ends, syntactic nodes, and so on, the schema of the calculation of the scores is regular. Exactly for this reason, it is possible to use this type of method for an implementation in hardware also.
The prediction of the scores for which is used the (term of art: fast match scores) is possible because the actual search lags behind the current extracted scores of the speech signal by a fixed number of time windows. Using the current scores, the prediction of the scores of further linguistic subunits is performed (see the reference A. Hauenstein, “Optimierung von Algorithmen und Entwurf eines Prozessors für die automatishce Spracherkennung”, p. 65, FIG. 33).
This type of look-ahead is also carried out in language models (see the reference S. Ortmanns et al., “Look-Ahead Techniques for Fast Beam Search”).
The principle of language model look-ahead is to take the probabilities existing in the language model into account in the search process as soon as possible, and also in the associated pruning. This is achieved by a factorization of the probabilities in the language model. A detailed description with a formal notation is contained in the reference S. Ortmanns et al., “Look-Ahead Techniques for Fast Beam Search”.
Finally, a threshold for selecting computing distance parameters is taught in the reference E. Bocchieri, “Vector Quantization for the Efficient Computation of Continuous Density Likelihoods”. Such selection methods are generally multi-step First, a rough calculation is performed using a part of the distances. In the next step, those distances are determined which are close to the best distance of the first computational step with respect to an interval score. This interval score can be varied via a threshold, whereby the computing outlay for determining the distance parameters is varied.