1. Field of the Invention
The present invention relates to a document retrieval system and a document retrieval method, and particularly to a system for retrieving a document including numeric data and a method of retrieving the same.
2. Description of the Related Art
Advances in digitization of document information have made it critical to retrieve pieces of information needed by users from a vast amount of information. Information retrieval techniques are those for solving such an issue, and Internet search engines are applications of the information retrieval techniques. Once a user inputs his/her search request, pieces of information relevant to the input are retrieved from the vast amount of document information, and the relevant pieces of information are displayed in order of relevance.
Japanese Patent Application Laid-open Publication No. 2000-155758 titled “Method of Searching Documents and A Service For Searching Documents” discloses so-called “associative search,” a method for retrieving, upon receiving a group of documents inputted by a user as his/her search request, a group of documents relevant to the inputted group of documents in a document database. In the associative search, firstly each document is broken into terms (or character strings). Then, a vector representing term frequencies each indicating how many times a specific term appears in each document is generated. Finally, documents relevant to the inputted group of documents are retrieved on the basis of similarities between vectors for the inputted documents and vectors for the documents in the document database (see Mochihashi, Daichi, et al. “Learning an Optimal Distance Metric in the Linguistic Vector Space,” The transactions of the Institute of Electronics, Information and Communication Engineer on Information and Systems D-II, Vol. J88-D-II, No. 4, pp. 747-756, April, 2005), and on the basis of similarities between documents calculated by using a probability model (see Japanese Patent Application Laid-open Publication No. Hei. 9-62693, titled “Document Classification Method Using Probability Model”).