In general, as an example of a text mining apparatus, a structure shown in FIG. 1 is well-known (refer to patent document: Japanese Unexamined Patent Application Publication No. 2001-84250 (fourth and fifth pages and FIG. 3)). Referring to FIG. 1, the conventional text mining apparatus comprises a basic-dictionary storing unit, a document-data storing unit, a field-depending dictionary storing unit, a language feature analyzing device, a language analysis device, a pattern extracting device, and a frequent-pattern display device.
The conventional text mining apparatus shown in FIG. 1 is schematically operated as follows. First, the language feature analyzing device generates a field-depending dictionary from a basic dictionary and document data and the language analysis device generates the structure of a syntax tree or the like from the basic dictionary, the field-depending dictionary, and the document data. The pattern extracting device extracts a frequent pattern by using the structure, a storing unit of a document matching the frequent pattern stores a document in the document data matching the frequent pattern, and simultaneously outputs the frequent pattern.
In general, the following structures generated by the language analysis device are frequently used.                (A1) A clause in a sentence is represented by a node of the structure.        (A2) Information about an attached word is represented by an attribute value of the node.        (A3) Dependency is represented by a directional branch from a node on a modifier to a node on a modifiee.        (A4) Information about a surface case is represented by an attribute value of the directional branch.        
Herein, the information about the attached word indicates an attached concept including tense such as present or perfect, modality such as easy or difficult, and negation. The information about the attached word is added to a clause by the attached word.
FIG. 2 shows an example of a syntax structure of such a sentence in the above form that “ Kare ha shashu A ga kakaku wo sageta no wo shiranai (He does not know that the price of a type A of vehicle has been down)”. Clauses “ kare (He)”, “ shashu A (type of A of vehicle)”, “ kakaku (price)”, “ sageru (has been down)”, and “ shiru (know)” in the sentence are represented by nodes. The information about the attached word is represented by an attribute value of the node (as the attribute value of the node “ shiru (know)”, the information about the attached word: negation). Dependency is represented by a directional branch from the node on the modifier to the modifiee (e.g., “ kare (He)”→“ shiru (know)”). Information about a surface case is represented by an attribute value of the directional branch (e.g., as the attribute value of the directional branch “ kare (He)”→“ shiru (know)”, “surface case  ha”).
Further, all the information in the structure can be expressed by a structure comprising the nodes having labels without the attribute values and only the directional branch without the attribute value. FIG. 3 shows an example of a syntax structure of such a sentence in the above form that “z,4 kare ha shashu A ga kakaku wo sageta no wo shiranai (He does not know that the price of the type of vehicle A has been down)”.
Clauses “ kare (He)”, “ shashu A (type A of vehicle)”, “ kakaku (price)”, “ sagenu (has been down)”, and “ shiru (know)” in the sentence are represented by nopes having labels without the attribute value (e.g., a label “surface case  ha” is added to the node “ kare (He)”, labels “information about the attached word perfect” and “surface case:  wo” are added), and the directional branch from the node on the modifier to the modifee does not have the attribute value.
The above-mentioned conventional system has the following problems. The following problems and the analysis for them are based on the research and examination result of the present inventors. Contents shown in FIGS. 4A to 4D, 5A, and 5B are presented by the present inventor for the purpose of specifically describing the cause of the problems.
As a first problem, it is exemplified that, upon detecting a frequent pattern, patterns with structures having a similar meaning and different connecting configurations are determined as entirely different patterns.
The connecting configuration indicates a configuration obtained by taking notice only on the node of the structure, a character string of words, a connecting relationship of the directional branch, and the direction and by omitting attached attribute information.
The reason why the first problem is caused is that the conventional text mining apparatus does not comprise means that determines the structures having different connecting configurations and a similar means, as the identical structure.
Examples of the difference between the structures having the different connecting configurations and the similar meaning are as follows upon using a sentence structure with the attribute value.                (B1) Difference between directions of the dependency,        (B2) Difference between dependency orders,        (B3) Difference due to replacement with synonyms, and        (B4) Difference between parallel syntax structures and meaning structures.        
FIGS. 4A to 4D show examples of the differences between the structures due to the connecting configurations. Upon using the sentence structure without the attribute value, all differences having the similar meaning are expressed by the difference between the connecting configurations.
In the example shown in FIG. 4A, between connecting configurations of “hayai no ha shashu A (A fast type of vehicle is A)” and “ shashu A ha hayai (A type A of vehicle is fast)” having the similar meaning, the modifier and the modifies are different from each other.
In the example shown in FIG. 4B, between connecting configurations of “ Hayaku yasui shashu A (A fast and cheap type of vehicle is A)” and “ Yasuku hayai shashu A (A cheap and fast type of vehicle is A)” having the similar meaning, node order relationships of “ hayai (fast)” and “ yasui (cheap)” as modifiers are different from each other.
In the example shown in FIG. 4C, between connecting configurations “ shashu A ha hayai (A type A of vehicle is fast)” and “ shahu A ha kousoku da (A type A of vehicle has a high velocity)” having the similar meaning, node order relationships of “ hayai (fast)” and “ kousoku (high velocity)” as the modifees are different from each other.
In the example shown in FIG. 4D, a syntax structure and a meaning structure of “ shashu A to shashu B ha hayai (A type A of vehicle and a type B of vehicle are fast)” are indicated. Referring to FIG. 4D, there are a connecting configuration in which “ shashu A (type A of vehicle)” as the modifier modifies the “ shashu B (type B of vehicle)” and “ shashu B (type B of vehicle)” modifies “ hayai (fast)” and a connecting configuration having directional branches from “ shashu A (type A of vehicle)” and “ shashu B (type B of vehicle)” as the modifiers to the “ hayai (fast)” as the modifee.
As a second problem, it is exemplified that structures having different attribute values and a similar meaning upon detecting a frequent pattern are determined as completely different patterns.
Because it is not considered in the conventional text mining apparatus that the structures having different attribute values are determined as an identical one.
Examples of the difference between the structures having different attribute values and the similar meaning upon using the sentence structure with the attribute value are the difference between the information about the attached word, the difference between the surface cases etc. FIGS. 5A and 5B show examples of the difference between the structures due to the attribute values.
In the example shown in FIG. 5A, between connecting configurations of “ shashu A ha kasoku (a type A of vehicle accelerates)” and “ shashu A no kasoku (acceleration of a type A of vehicle)” with the similar meaning, surface cases of directional branches differ from each other.
In the example shown in FIG. 5B, between connecting configurations of “ shashu A ha hayai (a type A of vehicle is fast)” and “ shashu A ha hayakatta (a type A of vehicle was fast)” having the similar meaning, information about the attached word of a node “ hayai (fast)” as the modifiee differs from each other.
As a third problem, it is exemplified that it cannot be adjusted how similar structures are determined as an identical one by a user of the text mining apparatus upon detecting the frequent pattern.
Because it is not considered in the conventional text mining apparatus to adjust how similar structures are determined as an identical one by a user upon detecting the frequent pattern.
Accordingly, it is one object of the present invention to provide a text mining apparatus, method, and program in which structures having a similar meaning and different connecting configurations are determined as an identical pattern and a frequent pattern is detected.
It is another object of the present invention to provide a text mining apparatus, method, and program capable of determining whether or not structures having a similar meaning and different attribute values are as an identical one and of adjusting the detection of a frequent pattern.
It is further another object of the present invention to provide a text mining apparatus, method, and program capable of adjusting the determination as how similar structures are an identical one by a text mining user and the detection of a frequent pattern.