With fast development of network information technologies, there have more and more technologies and tools relevant to data mining. A common user could collect lots of information, which he/she is interested in, and the collected information could be relevant to some special entities (e.g. query items).
With respect to the collected information set, a user may have two basic requirements. One is to locate some piece of information, which he/she is looking for. And the other is to browse all the content covered by the whole information set and to do deeper analyzing. The former requirement is called as “information retrieval requirement”, while the latter one is called as “information organization requirement”.
Some search engine can be applied on the information set, and can be a good tool to meet the information retrieval requirement. However, for the information organization requirement, the search results list provided by the search engine cannot work effectively, because reading the whole list and generating an organization method by human may take lots of time and human labor. To help the user easily browse the collected information set, firstly, an effective organization structure for the information set should be built. Since a generated information organization structure with good readability can help user easily understand and quickly navigate to the information he/she is interested in, and bring the user much better experience, how to construct a good organization structure for the collected information set becomes a general problem.
Usually, traditional methods for building an information organization structure automatically extract elements from the information set and build the structure according to the relationship among the elements in the information set. For example, the US patent application No. 2006/0026190A1, entitled “System and Method for Category Organization” and filed on Jul. 30, 2004, proposes a method to automatically discover category for a collected document set. The disclosure of the US application is hereby incorporated entirely by reference for all purposes. The method firstly generates a list of top N (i.e. N=50) most frequently occurring terms in the document collection. Secondly, a bit vector matrix (size N*M) for the list will be created. For each term in the list, a term bit vector, whose length equals to the number (M) of documents in the document collection, can be generated based on the status whether the documents contain the term or not. Thirdly, all predictive relations among all term bit vectors will be generated based on the bit vector matrix, and stored in the term prediction matrix which is a Bi-Normal separation Matrix of size N*N. Fourthly, negative and positive pair list will be determined based on the prediction matrix. And finally a structure will be constructed by the predefined procedures. For example, FIG. 1 shows an example of the information organization structures generated by this method.
With reference to the example of FIG. 1, it can be seen that the built structure according to the existing method of FIG. 1 is usually not satisfied because it does not have a good readability. In details, the various categories on the generated structure tend to be less meaningful, e.g. it may not be so easy for common user to understand what “not-battery-will-charge”, “screen”, “screen-dim” mean in FIG. 1. In addition, in some situations, the generated category trees are less reasonable. For example, there are two parallel root nodes generated in the example of FIG. 1, i.e. “main” and “main2”, thereby leading to some difficulty in navigating or browsing the related document set.