The present invention relates to speech recognition systems and more particularly to an improvement of Viterbi decoding using a recognition network to enable the identification of alternate paths through the network.
It is relatively conventional to implement speech recognition by applying Viterbi decoding to an acoustic recognition network which is made up of interconnected arcs which model respective preselected speech segments, e.g. phonemes or similar speech sub-units. At each of the intersections between arcs, commonly referred to as "nodes", data structures are implemented which preserve the identity of the best path reaching that node up to the present time.
Speech to be recognized is digitally converted to a succession of frames which characterize the speech at respective instances in time, e.g. by comprising a multidimensional vector characterizing the spectral content of the acoustic signal at that point in time. Successive frames are matched to each possible arc and a cost metric is evaluated which corresponds to the likelihood of match. When the end of the network, e.g. a silence, is reached, a trace back procedure identifies the path through the network which produced the best score.
The Viterbi decoding technique uses the fact that any sub-path of the optimal path through the network is itself optimal. This makes Viterbi decoding highly efficient but destroys a lot of possibly useful information about alternative recognition hypotheses. In practical speech recognition systems, however, particularly those attempting to recognize continuous speech, it is highly desirable to identify not only the best match of an unknown input pattern to a set of preselected speech patterns but also to provide good alternate hypotheses which may represent good solutions. The providing of multiple recognition hypotheses allows semantic and syntactic information to be applied at a later stage and allows the user to more easily correct misrecognitions. If the acoustic recognizer system provides for such alternate hypotheses, the application of contextual information may allow a much improved overall recognition procedure by allowing information regarding the surrounding words to enable a much improved choice from amongst the alternatives.
Among the several objects of the present invention may be noted the provision of an improved speech recognition system; the provision of such a speech recognition system which employs an acoustic recognition network and which identifies multiple possible paths through the network in response to input speech to be recognized; the provision of such a system in which the output is itself a limited network definition; the provision of such a system which facilitates the utilization of contextual information; the provision of such a system which does not require the evaluation of all possible paths through the network; the provision of such a system which is computationally efficient; the provision of such a system which is highly accurate; and the provision of such a system which is highly reliable and is of relatively simple and inexpensive implementation. Other objects and features are in part apparent and in part pointed out hereinafter.