The present application describes systems and techniques relating to information retrieval (IR) techniques, for example, taxonomy generation for a document structure.
Searching information in a large collection of documents is often time-consuming. To increase the speed of the search, the document collection may be organized in a structural way, e.g., in clusters where documents of similar topics are stored together. Taxonomy generation deals with categorizing and labeling documents to satisfy a user's need for efficient document searching and retrieval.
A common approach to categorizing documents uses clustering algorithms, which group documents with similar types of contents in a cluster. After the clustering operation, a label is given to each cluster to describe the type of documents in the cluster. The ability of a user to navigate the document structure may depend on the descriptiveness of the labels. However, descriptive labels may be hard to find, if not impossible. Moreover, some of the clusters may be related to one another, and the cluster labels typically do not reflect such a relationship.