A finite state automaton (FSA) is used in various fields such as natural language processing and speech recognition. For example, the FSA has uses such as searching a specific character string or a specific pattern of character string in text. The FSA may also be referred to as a finite automaton (FA) or a finite state machine (FSM).
There is an extended model of the FSA such as a finite state transducer (FST) that outputs, when an input symbol is input thereto, an output symbol corresponding to the input symbol in a case where output symbols are present in addition to input symbols, for example. The FST may be used for a lexicon in speech recognition, for example. Such a lexicon is constituted as an FST that outputs, when a pronunciation is input thereto, a word corresponding to the pronunciation. In addition, there are also a weighted finite state automaton (WFSA) taking weights for handling probabilities and distances into account in addition to input symbols, and a weighted finite state transducer (WFST) that is a model including both the FST and the WFSA. The WFSA and the WFST may be used as a model for expressing a lexicon or a model necessary for speech recognition such as a language model in speech recognition or may be used for statistical machine translation.
For using the FSA or a model that is an extended version thereof as described above in actual applications thereof, necessary information needs to be converted into the FSA or the extended version thereof. In this case, if using a transition moving to a next state without an input symbol, the conversion can be facilitated in some cases. Such a transition is called an ε-transition. The ε-transition is assigned with an empty symbol sequence (also referred to as an empty symbol, an empty character string or an empty input) instead of an input symbol. Such an empty symbol sequence is represented by ε. A set of states that can be reached only through 0 or more ε-transitions from a certain state is called an ε-closure. A process of removing ε-transitions is called ε-removal. The ε-removal is performed for the purpose of decreasing the number of unnecessary transitions to reduce the processing time, for example. In order to reduce the amount of memory and the processing time, processing desirably results in as small the number of transitions after ε-removal as possible.
U.S. Pat. No. 7,027,988 proposes a technique for removing an ε-transition included in a WFSA, an FST and a WFST. In a method of related art including U.S. Pat. No. 7,027,988, the ε-removal is realized by obtaining an ε-closure in a certain state q and setting an outgoing transition from a state included in the ε-closure to an outgoing transition from the state q. When a weight or an output symbol is assigned to a transition, a weight or an output symbol of a transition that is newly generated as a result of the ε-removal out of the outgoing transitions from the state q is obtained taking the weight or the output symbol on the transition into account.
In the method for ε-removal in related art, however, there is a dependence on the processing direction. Specifically, if the method of the related art is used with the directions of automaton transitions to be processed all reversed, the same result as that before the directions are reversed is not always obtained. This is because states included in ε-closure vary. In order to make the number of transitions as small as possible in performing ε-removal by using the method of the related art, a method of performing ε-removal in both directions and then selecting one with a smaller number of transitions can be considered. In this case, the processing time increases because it is necessary to perform the ε-removal in both directions. It is thus desirable to perform ε-removal without depending on the direction of ε-transition.