1. Field of Invention
This invention is related to systems and methods for analyzing and manipulating weighted or unweighted finite state automata, such as those usable in continuous speech automatic speech recognition systems and methods.
2. Description of Related Art
Flexible and robust automated speech recognition systems have long been sought. Automatic speech recognition can be viewed as a processing pipeline or cascade. In each step of the processing cascade, an output string of data from an upstream processing element is input into a current processing element. The processing element of each step uses a directed graph, such as a finite state automaton or a finite state machine, to convert the input data string into an output data string. At each processing element, each portion of the input data string generates one or more possible paths, or hypotheses, through that processing element. The data portions can represent acoustic information, phonemes, words, text strings or the like, depending on the processing element.
In automatic speech recognition, the term xe2x80x9clatticexe2x80x9d denotes an acyclic directed and labeled graph, which is usually weighted. In each lattice, there is typically a designated start, or initial, node and one or more final nodes. Each possible path through the lattice from the initial node to a final node induces a hypothesis based on the arc labels extending between each pair of nodes in the path. For example, in a word lattice, the arc labels are words and the various paths between the initial node and the final node form word strings, such as sentences.
Speech recognition systems have progressed from simple, isolated word tasks that recognize only a few words, to dictation systems that are capable of recognizing continuous speech, to systems for directory information retrieval. Continuous speech recognition systems often have active vocabularies of over 500,000 words. Directory information retrieval systems often need vocabularies having millions of words.
To support these larger applications, conventional speech recognition systems use weighted finite state transducers to represent the valid set of word strings, such as sentences, that can be accurately recognized. The weights of the weighted finite state transducers are typically determined from a statistical model. This statistical model is based on statistically analyzing a large corpus of text data.
In practice, conventional speech recognition systems use an acoustic weighted finite state transducer to convert spoken utterances into sequences of phonemes and at least a grammar weighted finite state transducer to convert the sequences of phonemes into recognized word strings, such as sentences. The weights of at least the grammar weighted finite state transducer are combined with the weights produced by the acoustic finite weighted state transducer to determine the probability of each recognition hypothesis for a given utterance. The combined weights are then used to prune out the less-likely hypotheses during a Viterbi beam search or the like. Accordingly, it is essential to accurately determine the weights on the acoustic and grammar finite state transducers if the speech recognition system is to viably handle the large-vocabulary speech recognition tasks outlined above.
If the large-vocabulary speech recognition task to be performed by the speech recognition system does not have an available training corpus, at least the grammar weighted finite state transducer might be left unweighted. This occurs, because, as outlined above, the weights on the weighted finite state transducers are determined statistically from the training corpus. However, it should be appreciated that, while an unweighted finite state transducer can be used, the speed and accuracy of the speech recognition system may be considerably reduced.
Classical shortest-paths problems in a weighted directed graph arise in various contexts. The problems divide into two related categories: single-source shortest-path problems and all-pairs shortest-path problems. Determining the single-source shortest-path problem in a weighted directed graph comprises determining the shortest path from a fixed source node xe2x80x9csxe2x80x9d of the nodes of the weighted directed graph to all other nodes of the weighted directed graph. Determining the all-pairs shortest-path is more general than finding the single-source shortest-path, and comprises finding the shortest path or paths between all pairs of nodes of the weighted directed graph.
In the classical shortest-path problem, the weights on the transitions between the nodes of the weighted directed graph represent distances, costs, or any other real-value quantity that can be added along a path and that one wishes to minimize. These classical shortest-path problems can be generalized to use other types of transition weights and to use other mathematical operations. In particular, the weights and operations can be any type of weight and any type of operation that can be defined using semirings.
Semirings define an algebraic structure, as set forth in xe2x80x9cFinite-State Transducers in Language and Speech Processingxe2x80x9d, Mehryar Mohri, Computational Linguistics, 23:2, 1997 and in xe2x80x9cSemirings, Automata, Languagesxe2x80x9d, W. Kuich et al., Monographs in Theoretical Computer Science, Vol. 5, Springer-Verlag, Berlin, 1986, each incorporated herein by reference in its entirety. As defined in Kuich, semirings combine a xe2x80x9cmultiplicationxe2x80x9d operation, symbolized as xe2x80x9c{circle around (X)}xe2x80x9d and an xe2x80x9cadditionxe2x80x9d operation, symbolized using xe2x80x9c⊕xe2x80x9d.
Classically, the transition weights are real numbers and the specific operations used to determine the shortest path include the addition and minimum operations. In particular, the transition weights are added along a path using the addition operation as the {circle around (X)} operation. Once all the path weights are determined by addition, the minimum operation is applied as the ⊕ operation to select the path having the minimum weight.
Thus, the transition weights of the directed set are elements of an arbitrary set K, which may be the set of real numbers, a set of strings, a set of regular expressions, subsets of another set, or any other quantity that can be multiplied along a path using the xe2x80x9c{circle around (X)}xe2x80x9d operation, and that can be xe2x80x9csummedxe2x80x9d using the xe2x80x9c⊕xe2x80x9d operation. That is, the weight of a path is obtained by xe2x80x9cmultiplyingxe2x80x9d the transition weights along that path using the xe2x80x9c{circle around (X)}xe2x80x9d operator. Then, the shortest distance from a source node xe2x80x9csxe2x80x9d to an end, or final, node xe2x80x9cfxe2x80x9d is the xe2x80x9csumxe2x80x9d of the weights of all paths from the source node xe2x80x9csxe2x80x9d to the ended node xe2x80x9cfxe2x80x9d using the xe2x80x9c⊕xe2x80x9d operator.
Within the generalized definition of the shortest distances set forth above, the systems and methods according to this invention determine the shortest distances between a source node xe2x80x9csxe2x80x9d and an end node xe2x80x9cfxe2x80x9d of a weighted directed graph, such as a weighted finite state automaton.
As indicated above, unweighted finite state automata may be used in conventional speech recognition systems. However, such unweighted finite state automata generally considerably reduce the speed and accuracy of the speech recognition system. Unfortunately, developing a suitable training corpus for a speech recognition task that accurately reflects the a priori probabilities of different word and/or phoneme combinations is time consuming and expensive, if it is even possible.
For example, the training corpus for a directory information retrieval speech recognition system, given the huge numbers of given names and surnames used in the United States, and the potential variations in spelling and pronunciation, suggests that a training corpus for this speech recognition task would be prohibitively expensive and time consuming to compile.
Additionally, it is highly unlikely that any such training corpus could adequately reflect the various probabilities for the word and/or phoneme combinations. This occurs because the directory information speech recognition task is equally likely to have to recognize speech corresponding to any residential entry in the directory information database as any other residential entry. Similarly, because the speech recognition task is likely to have only the given name, surname, and city, and possibly the address, the directory information speech recognition task is likely to have insufficient context information.
Accordingly, such very-large-vocabulary speech recognition systems often must be used in an unweighted state.
This invention provides systems and methods for assigning weights to the transitions of unweighted directed graphs.
This invention further provides systems and methods for assigning weights to the transitions of unweighted directed graphs where the weighting information is derived solely from the unweighted directed graph itself.
The systems and methods of this invention accurately determine the transition weights for acyclic speech recognition systems, thus providing sufficient pruning information necessary for beam search algorithms.
This invention separately provides systems and methods for pushing weights through an arbitrarily weighted directed graph.
This invention further provides systems and methods for generalizing classical shortest-paths algorithms to other algebras.
This invention separately provides systems and methods that are able to determine the single-source shortest distances for an arbitrarily weighted directed graph.
This invention separately provides systems and methods that are able to approximately determine the single-source shortest distances for a weighted directed graph.
This invention additionally provides systems and methods for determining the single-source shortest distances in a weighted directed acyclic graph.
This invention separately provides system and methods having reduced complexity for determining the single-source shortest distances.
This invention separately provides systems and methods for determining the all-pairs shortest distances for an arbitrarily weighted directed graph.
This invention separately provides systems and methods that are able to reweight a weighted directed graph based on the determined single-source shortest distances for that weighted directed graph.
In the systems and methods according to this invention, these systems and methods for determining the single-source and all-pairs shortest distances are generic, in that any semiring covered by the generic framework of the systems and methods of this invention will work. These systems and methods for determining the single-source and all-pairs shortest distances according to this invention are also generic in that the systems and methods according to this invention will determine the single-source shortest distances regardless of the queue discipline chosen to implement a particular exemplary embodiment of the systems and methods according to this invention. In particular, the classical algorithm of Ford et al. is a special case of the generic systems and methods of this invention.
In particular, the systems and methods according to this invention are usable with any right semiring. Accordingly, the classical algorithm described in Lawler is also a special case of the general systems and methods of this invention.
The systems and methods according to this invention also reweight the directed graph based on the determined shortest distances, so that the weights are, for example, front weighted. Accordingly, searches through the directed graph that are based on the total weights of the paths taken will be more efficient. The systems and methods according to this invention further arbitrarily weight an unweighted directed graph so that the shortest distance and reweighting systems and methods can be applied to that directed graph.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of the exemplary embodiments of the automatic speech recognition systems and methods according to this invention.