1. Field of the Invention
The present invention relates to a system and method of clustering documents capable of determining a similarity between documents, and clustering similar documents on the basis of the determined similarity.
2. Description of the Related Art
Recently, a document retrieval system has been widely used, which processes countless document information, extracts information corresponding to user demand, and provides the extracted information to a user.
That is, document retrieval or information retrieval refers to searching for documents or information desired by a user from bulk documents or information. To retrieve documents or information, keyword processing is performed with respect to natural language texts, a weight is assigned to each keyword, and then retrieval and ordering are conducted.
The related art document retrieval system receives a query of a user, and outputs a common result extracted by a common system to the user. Here, a general retrieval system searches documents only on the basis of an area of the query received from the user, and thus it is difficult to provide the user with information characterized according to user's tastes and characters.
Also, since the related art retrieval system searches for information regarding just the query input by the user, a wrong retrieval range may be established. For this reason, information desired by the user and retrieval results show much difference, causing accuracy and reliability of retrieval results to degrade.
In addition, when receiving a query from a user, the related art document retrieval system performs an operation depending on a retrieval system used by sites providing information. Hence, accuracy of retrieved information is lowered, and it becomes difficult to provide information in real-time. However, in the case of documents that must be retrieved right after its generation or before a long time is elapsed after its generation, such as patent documents, a document accessing method and a search method characterized for a user are being required.