The present invention relates to a character and/or character-string retrieving method and more particularly to a character and/or character-string retrieving method that can be applied in a key-information extracting device for extracting key information (e.g., date and time) from, e.g., a document and an electronic mail, a document summarizing device for summarizing document data and filing systems of a document processing device, word processor and PDA (personal digital and assistant devices).
A method for retrieving (pattern matching) a character-string by using a finite automaton has been studied and a representative algorithm has been described in detail in "&lt;Compilers, principles, techniques and tools&gt;, A. V. Aho, R. Sethi and J. D. Ullman, Addison-Wesley Publishers Limited, 1986".
A conventional algorithm will be briefly explained as follows:
There's a flow of procedures on how to prepare a conventional deterministic finite automaton.
In the flow, a pattern r of a regular expression desired to be retrieved by pattern matching is first prepared and an augmented regular expression (r)# is then formed from the prepared regular expression.
A syntax tree T of the augmented regular expression is prepared (by using a method described in detail in the above-mentioned document &lt;Compilers, principles, techniques and tools&gt;), according to which a set of states and a state-transition table are prepared. Among states in a set, a state including a position corresponding to a marker # are considered as an accepting state.
There's a flow of procedures how to optimize the number of states of the prior art deterministic finite automaton.
An initial partition .PI. of a set of states is constructed of two groups: accepting states and non-accepting states.
A new partition .PI..sub.new is constructed from the partition .PI. by splitting each group into subgroups (i.e., if a state "s" in a group "a" goes to a state in a group "b" on a character and/or character-string input (M) but any other state "t" in a group "a" goes to a state out of the group "b" on the same character and/or character-string input (M), the group "a" must be split into two subgroups so that one subgroup contains the state "s" and the other contains the state ("t"). This process of splitting groups in the current partition is repeated until no more groups can be to split. A final partition .PI..sub.final is thus obtained.
A deterministic finite automaton (DFA) M' having the optimized number of states is formed from the final partition .PI..sub.final.
As described above, matching of any character-string by using the prior art finite automaton with only one accepting state was intended to determine whether pattern matching of that character-string would succeeded or not. It was not necessary to determine what pattern matched the character-string. Consequently, when retrieving a plurality of various kinds of patterns separate automatons must be prepared and used for retrieval of a plurality of different patterns respectively.