A document management system is known which documents used in a particular business assignment can be recorded in a database and can then be reused in another business assignment. Moreover, from a document group managed in the document management system, knowledge is extracted using data mining and text mining, and the extracted knowledge is put to use in analyzing and improving the business assignments.
Furthermore, as a way of searching for the intended documents from a document group managed in the document management system, a key phrase (word and a string of words) search and a facet search are known. In a facet search, a plurality of items and a hierarchical structure meant for classifying the documents are defined in advance; the user is made to sequentially select the items from the higher level to the lower level; and the documents are narrowed down.
Also regarding the manner of enabling the user to refer to the features of a document group managed in the document management system, various methods have been proposed. For example, as a method of enabling the user to refer to the features of a document group, the OLAP function (OLAP stands for Online Analytical Processing) is known. The OLAP function enables referring to the features of the entire document group in an overviewing manner as well as enables referring to the features of the document group while drilling down to the information indicating the details from among the information indicating the overall features. Alternatively, as a method of enabling the user to refer to the features of a document group, a heat map is also known. In a heat map, the features of the information classified from two different perspectives are expressed in a map having two axes.
Meanwhile, in a facet search, the structure of items needs to be defined in advance. However, for example, designing the structure of items and designing the corresponding database requires a substantially large cost. Moreover, in an advanced stage of operations of the document management system, even if there arises a need to search for documents and refer to the features of a document group from a new perspective, it is a difficult task to change the hierarchical structure of the already-defined items and to change the database structure.
On the other hand, a method is also known by which the items for classification are automatically generated using clustering. In this method, the structure of the items need not be designed in advance. However, in the method of automatically generating the items for classification using clustering, there is a significant restriction on the items that can be actually used. For example, in the method of automatically generating the items for classification using clustering, classification can be done into only such items, such as the quantity expression, discrete attributes such as colors and shapes, and the package names of source codes, which have their hierarchy information and their structure described in the documents.