A finite state automaton (FSA) is implemented in various fields such as natural language processing and speech recognition. For example, the applications of an FSA include searching for specific character strings in a text or searching for patterns of character strings in a text. Meanwhile, a finite state automaton is sometimes also called finite automaton (FA) or finite state machine (FSM).
There are various types of an FSA. For example, the types of an FSA include a finite state acceptor, a finite state transducer (FST), a weighted finite state acceptor (WFSA), and a weighted finite state transducer (WFST). A finite state acceptor can only output whether or not a symbol sequence that has been input is accepted. A finite state transducer outputs a symbol sequence according to the symbol sequence that was input. A weighted finite state acceptor outputs a weight according to the symbol sequence that was input. A weighted finite state transducer outputs a weight and a symbol sequence in response to the symbol sequence that was input.
Herein, the weight can be in the form of probability or distance. In the following description, a finite state acceptor is simply referred to as an acceptor. Moreover, typically, a finite state acceptor is sometimes referred to as the finite state automaton (FSA). In the following explanation, “finite state automaton” is used as a collective term for an acceptor, an FST, a WFST, and a WFSA.
An FST is used, for example, as a word dictionary in speech recognition. Such a word dictionary is configured as an FST that outputs a word in response to a pronunciation that was input. A WFSA or a WFST is used as a model such as a language model in speech recognition for expressing a dictionary or a model required in speech recognition. Alternatively, a WFSA or a WFST is used in statistical machine translation.
As described above, while using an FSA in various applications, necessary information needs to be converted into the FSA. Usually, such conversion is performed by following a simple conversion sequence. However, it is often the case that the FSA obtained by such conversion does not have a suitable configuration for the subsequent processing. Hence, there arises a need to perform conversion of the FSA as necessary. One of such conversion methods is determinization as described below.
A deterministic finite state automaton (DFSA) points to an FSA in which, when a particular input symbol is input, the next state with respect to that input symbol is uniquely determined in any state. A nondeterministic finite state automaton (NFSA) points to an FSA that is not of a DFSA. That is, an NFSA points to an FSA in which a plurality of next states are with respect to a particular input symbol. Herein, determinization points to the conversion of an NFSA into a DFSA. During determinization of an acceptor, for example, subset construction is used.
In the method implemented in the conventional technology, a DFSA is newly generated while retaining the storage area for the NFSA. Thus, in such a method, in order to perform determinization of an NFSA containing a large amount of states and transitions, it becomes necessary to secure a large storage area.