1. The Field of the Invention
The present invention relates to identifying a ranked list of unique hypotheses and more specifically to determining the N-best strings of an automaton. More particularly, the present invention relates to systems and methods for determining the N-best distinct strings of a weighted automaton through partial determinization of the weighted automaton.
2. The Relevant Technology
A speech recognition system is an example of a computing system that converts real world non-digital type input data into computer-usable digital data and that converts computer digital data into real-world output data. A speech recognition system receives speech from a variety of different sources, such as over the telephone, through a microphone or through another type of transducer. The transducer converts the speech into analog signals, which are then converted to a digital form by the speech recognition system. From the digital speech data, the speech recognition system generates a hypothesis of the words or sentences that were contained in the speech. Unfortunately, speech recognition systems are not always successful at recognizing speech.
In order to improve the likelihood of correctly recognizing the speech, speech recognition systems generally try to make a best guess or hypothesis using statistical probabilities. In fact, speech recognition systems often generate more than one hypothesis of the words that make up the speech. In one example, the various hypotheses are arranged in a lattice structure such as a weighted automaton, where the weight of the various transitions in the automaton correlate to probabilities.
The weighted automaton or graph of a particular utterance thus represents the alternative hypotheses that are considered by the speech recognition system. A particular hypothesis or string in the weighted automaton can be formed by concatenating the labels of selected transitions that form a complete path in the weighted automaton (i.e. a path from an initial state to one of the final states). The path in the automaton with the lowest weight (or cost) typically corresponds to the best hypothesis. By identifying more than one path, some speech recognition systems are more likely to identify the correct string that corresponds to the received speech. Thus, it is often desirable to determine not just the string labeling a path of the automaton with the lowest total cost, but it is desirable to identify the N-best distinct strings of the automaton.
The advantage of considering more than one hypothesis (or string) is that the correct hypothesis or string has a better chance of being discovered. After the N-best hypotheses or paths have been identified, current speech recognition systems reference an information source, such as a language model or a precise grammar, to re-rank the N-best hypotheses. Alternatively, speech recognition systems may employ a re-scoring methodology that uses a simple acoustic and a grammar model to produce an N-best list and then to reevaluate the alternative hypotheses using a more sophisticated model.
One of the primary problems encountered in evaluating the N-best hypotheses is that the same string is often present multiple times. In other words, the automaton or word lattice often contains several paths that are labeled with the same sequence. When the N-best paths of a particular automaton are identified, the same label sequence may be present multiple times. For this reason, current N-best path methods first determine the k (k>>N) shortest paths. After the k shortest paths have been identified, the system is required to perform the difficult and computationally expensive task of removing redundant paths in order to identify the N-best distinct strings. Thus, a large number of hypotheses are generated and compared in order to identify and discard the redundant hypotheses. Further, if k is chosen too small, the N-best distinct strings will not be identified, but rather some number less than N.
The problem of identifying the N-best distinct strings, when compared to identifying the N-best paths of a weighted automaton, is not limited to speech recognition. Other systems that use statistical hypothesizing or lattice structures that contain multiple paths, such as speech synthesis, computational biology, optical character recognition, machine translation, and text parsing tasks, also struggle with the problem of identifying the N-best distinct strings.