Recently, since high-speed, large-capacity communications infrastructures including computers and the Internet have been widespread, massive amounts of unformatted text information have been created and accumulated. Therefore, it is becoming increasingly important to make use of text information as unstructured information. Particularly, in order to handle text information, techniques for performing natural language analysis processing to extract and make use of various hierarchical levels of semantic information have been actively developed lately.
In conventional natural language analysis processing, an appropriate character string analysis method, such as morphological analysis, is first used to tag each word as a part of speech (i.e., to perform part-of-speech tagging). In this case, if the language does not have explicit word boundaries (no space between words), such as Japanese, Chinese, or That, the text is segmented into units of words before the part-of-speech tagging, whereas if the language has explicit word boundaries, such as French or German, the part-of-speech tagging is performed without word segmentation of the text. Then, semantic representations representing higher semantic levels are extracted from relationships among plural words.
To extract semantic representations from text information, editable patterns of lexicographic representations to be extracted are defined and provided because required information is generally different depending on the specific application. Specifically, the patterns to be defined are to define a local syntax for each word, or to be more specific, the occurrence and order of the word. Syntactic dependency can be used as semantic information extracted using such a word string pattern.
The following describes conventional processing of syntactic dependency extraction in Japanese with reference to FIG. 14. For example, a sentence “ ,  <” 500 shown in FIG. 14 is segmented by morphological analysis into words “ (noun)”, “ (postpositional particle)”, “ (noun)”, “ (postpositional particle)”, “ (noun)”, “ (/verb)”, “ (auxiliary verb)”, “‘ (comma)”, . . . , “ (/verb)”, “< (auxiliary verb)”, “ (verb)”, “ (auxiliary verb),” and “• (period)”. Then, part-of-speech information and a regular expression, if any, are tagged for each word to obtain morphological analysis data 502.
When dependency parsing using a constraint array pattern 504 defined in the sequence of words “(noun)”, “ (postpositional particle)”, “(noun)”, and “  (verb)” is applied to the obtained morphological analysis data 502, syntactic dependencies that match the constraint array pattern 504, i.e., “ . . . ” and “ . . . ” are extracted.
In the prior art, since plural array patterns are processed concurrently, the array pattern 504 is stored in an ordered tree structure called a trie or prefix tree (hereinafter referred to as “trie”), and pattern matching between the constraint array pattern 504 defining the semantic dependency and the sentence 500 is performed by an algorithm applied to the input sentence as a non-deterministic finite automaton (NFA).
In the pattern matching processing using the trie, the implementation must consider the amount of computation and memory usage sufficiently. It has been known to convert an NFA to an equivalent deterministic finite automaton (DFA) in order to implement the NFA in a sufficiently practical manner. However, since the conversion from NFA to DFA increases the memory usage, there is a trade-off relationship between computational efficiency and memory usage.