1. Field of the Invention
The present invention relates to a method for transforming sets of input strings into at least one pattern expression that is a string expressing the sets of the input strings. The present invention particularly relates to a method for extracting a transformation pattern as an approximate pattern expression in the process of transforming the sets of input strings to at least one pattern expression, the transformation pattern being for transforming the sets of input strings to the pattern expression.
2. Description of Related Art
In setting a document parsing tool used in software development, a regular expression is often utilized to extract a sub-string of a processing target in a document. The extraction target can be a large number of similar sets of strings described according to a particular rule specific to a project. However, in an existing tool, a pattern expressing the rule needs to be manually acquired as an appropriate regular expression. This considerably restricts automation of processing of extracting a sub-string or accessibility of the aforementioned tool.
The regular expression in tools for upstream documents can be utilized for the purposes of, for example: utilizing a particular part of a document by extracting the part by use of the regular expression; designating a file to be subjected to certain processing by use of the regular expression; performing appropriate processing to a subset of files in a folder by sorting the files by using the regular expression; and extracting and utilizing semantic information by extracting a common part or a variable part from a text group.
In addition to the parsing of an upstream document, the regular expression can be used for, for example, lexical analysis in a compiler, keyword searching for text on a web or the like, and in the following scene. Specifically, when different kinds of processing are desired among files in a folder, a particular file is extracted by file name matching using the regular expression and is processed by particular processing.
Japanese Patent Application No. 2004-354787 describes an interactive method executed in an interactive device having a user-interactive feature; the interactive device; an interactive program; and a recording medium in which the interactive program is recorded. Every time a user inputs an information request, the interactive device identifies the content of the information request of the user by using a history of communications with the user, and responds to the user according to the content.
Japanese Patent Application No. 2004-139446 describes a secretary agent system, a secretary agent program and a dialogue planning method which are used in an everyday language computer system configured to process language text based on a semiotic base which is a structured collection of meaning resources in everyday language, the secretary agent system and the secretary agent program assisting interactive exchange of language text between the user and the everyday language computer system.
Japanese Application No. 2004-513458 describes: a method for allowing a user to view and modify a weighting for translation of a source language string; a machine translation system for allowing a user to view and modify a weighting for translation of a source language; and a product having computer-readable program means for allowing a user to view and modify a weighting for translation.
Japanese Application 2010-79723 describes an information processing apparatus comprising: a state classifying unit that generates state sets from states included in a deterministic finite state automaton by classifying the states into the state sets according to input symbols associated with outgoing transitions and finality indicating whether a state is a final state, in such a way that states in each set have the same input symbol and the same finality; a calculating unit that calculates an intersection of each state set and a set of transition destination states to which the states in the state set are transitioned, and iterates the calculation of the intersection, until the number of states in the intersection reaches one, by targeting, as a new state set, a set of transition destination states to which states included in the intersection are transitioned with the same symbol; and a state merging unit that merges plural indistinguishable states into one state by tracing the transition arrows in a reverse direction to the transition direction that the calculating unit follows, when the number of states in the intersection reaches one.
Japanese Application No. 2008-305722 describes search using indeterminate strings as a search string. The indeterminate strings can be expressed as regular expressions which include candidate characters selected for and united to each input character, and which are each formed by concanating the candidate characters for the input characters. Thus, search using a finite state automaton including the regular expressions can be performed instead of the above search using the indeterminate strings. The search using the finite state automaton, however, requires the following processing. Specifically, the finite state automaton needs to be provided with accept states accepting all the characters other than the candidate characters, in addition to the states of the candidate characters for each character. Moreover, each of the states of the candidate characters is associated with a certainty degree. Every time the acceptance of a character in a sub-string makes the state transition to the following state, the certainty degrees of the state and the following state are added up.
Japanese Application No. 2011-123794 describes provision of a technique for extracting information structured from a natural sentence, without using a parsing technique.
Japanese Application No. 2002-229981 describes a method for normalizing input strings.
Japanese Application No. 01-180046 describes a knowledge-based system and a method for understanding a natural language.
Japanese Application No. 2005-301780 describes an information processing apparatus performing dialogue processing and an information processing method for the same.
Japanese Application No. 2011-141627 describes a device and a generation method by which configuration data of a predetermined reconfigurable finite state automaton circuit is generated based on any regular expression.