PLT 1 describes a data dividing method, wherein when multidimensional data are divided with items having a hierarchical structure, the data are divided into groups suitable for analytical purposes. When a data dividing device described in PLT 1 receives a data group and a classification hierarchy of the data group, and the data dividing device outputs a classification hierarchy obtained by deleting uncharacteristic hierarchy from the classification hierarchy, based on the distribution of the received data group. More specifically, determination means adopts a particular classification as a dividing target, and determines an attribute indicating whether a dividing target group is characteristic or not by performing statistical test based on the distribution of the data group (dividing target group). Subsequently, the dividing means divides the dividing target group into a child group that belongs to a child class, based on the determination result, and adopts the child group as a new dividing target. Then, integration means integrates the uncharacteristic child group to the parent group based on the attribute of the determination result. More specifically, the integration means deletes the uncharacteristic hierarchy, and leaves only the characteristic hierarchy. For this reason, classification up to the characteristic child class can be obtained by following the output classification hierarchy from the parent classification in order.
PLT 2 describes a term dictionary generation method for outputting relationship between terms based on input document data. In the term dictionary generation method described in PLT 2, first, related terms are selected based on each term and position information of the document data. Subsequently, a graph is generated in which the terms and the related words are shown as nodes. Further, for a combination of any two nodes in the graph, a cooccurrence statistical amount is calculated, and in addition, the degree of similarity is calculated from a synonym dictionary and other document data. Then, the graph is converted based on a conversion rule using the cooccurrence statistical amount and the value of the degree of similarity.
PLT 3 describes a document organizing device automatically classifying, with high accuracy, a large amount of document groups accumulated in an information processor according to the features thereof. The document organizing device described in PLT 3 defines a certainty factor conf (H→B) and a support sup (H→B) representing an cooccurrence frequency of a keyword pair (H, B). Then, an XY plane defined by point (X, Y)=(conf (kw→wi), conf (wi→kw)) is divided into five, and hierarchical relationships, equivalence relationships, and association relationships are determined.
PLT 4 describes a classification system generation device automatically establishes a classification system of a hierarchical structure from a flat classification frame. The classification system generation device described in PLT 4 generates clusters by clustering, starting from a non-hierarchical type (i.e., flat classification frame). Then, these generated clusters are adopted as upper classification frames, and a hierarchical structure classification system is prepared. After integrating with other clusters with attention given to upper classification frames (i.e., clusters) of which classification accuracy is less than a reference value, the hierarchy is extended by re-clustering. In the classification system generation device described in PLT 4, when the classification accuracy of the existing classification system is less than the reference value, or when the classification system is corrected according to the situation, the classification system of the document classification unit is stored to the classification system storage unit and adopted as an optimization target. Then, the classification is evaluated and changed based on a classified document input from a document input unit and a sample document representing the situation, so that the classification accuracy is improved.