1. Field of the Invention
The present invention relates generally to clustering of documents. More specifically, systems and methods for clustering documents, such as for scientific documents, taking into account the citation patterns of the documents are disclosed.
2. Description of Related Art
When a search returns multiple search result documents, it is often useful to provide the search result documents to the user in a number of coherent and logical groups or clusters. However, as with most documents, similarities among the documents can be compared across multiple dimensions or directions. As an example, a document set containing documents pertaining to both mathematics and physics may be divided into a mathematics group and a physics group. However, the same document set may also be divided along an orthogonal dimension into theoretical and experimental groups. As is evident, division of the document set along other dimensions may also be possible.
In the absence of any context, dominant clusters are often generated. Dominant clusters generally refer to a relatively small number of relatively large groups where each member of a given group is coherent with other members in the same group along some dimension. The generation of dominant clusters is a well studied concept and can be optimized using a number of measures. However, search engine results are generally presented in an order ranked in accordance with the relevance of the documents to the search query without regard to the relevance that the search result documents have to each other. It would be desirable to group the search result documents, particularly for scientific documents, into logical and coherent clusters such that the search result may be presented in a more meaningful manner to the user.