1. Field of the Invention
The present invention relates to a document retrieval system, a search condition input apparatus, a retrieval execution apparatus, a document retrieval method, and a document retrieval program suitable for the retrieval of a document from a set of documents when the search conditions submitted by the user include numeric expressions.
2. Description of the Related Art
Japanese Unexamined Patent Application Publication No. 2000-322416 describes a technique for retrieving a document from a set of documents by using an index that lists terms appearing in the set of documents, the number of documents in which each term appears (its document frequency DF), and the frequency with which the term appears in each document (its term frequency TF). When a user specifies a term as a search condition, its associated DF and TF values are obtainable from the index information, so they do not have to be calculated and the retrieval time is shortened accordingly. The retrieved documents can also be scored by a mathematical formula using the DF and TF values, and when the retrieval result is presented to the user, the document identifiers can be presented in descending score order.
Since the DF value of a term changes over time as documents are added, deleted, or modified, the index information includes, for each term, a plurality of DF values and the dates on which they were calculated. If the user specifies a range of dates as a search condition, in order to restrict the search to documents added or updated within the specified date range, the documents are scored on the basis of DF values in the specified range of dates.
Since the scoring formula weights the TF values according to the DF values, a specified date range alters all of the scores calculated for the documents. The user in general does not realize this, and may think that the date range has no more significance than a keyword term search condition. The retrieval result may therefore turn out to be rather different from what the user expects.
There are also cases in which the user would like to specify dates appearing as character strings in the documents as search conditions, instead of specifying a range of dates on which documents were added or modified. This is not provided for in the prior art cited above.
Depending on the rules by which the retrieval engine operates, the user may be able to specify a date as a retrieval condition in the same way as an ordinary word or phrase is specified, but because of the different ways in which dates are presented in text, this type of retrieval condition does not always yield the desired result: for example, a search conducted with ‘May 1’ as a search condition may fail to find documents including such expressions as ‘5/1’ or ‘1st of May’.
In dealing with dates, accordingly, current document retrieval techniques lack flexibility and convenience, and tend to produce retrieval results of poor quality and low utility. The same is true for other numeric search conditions, such as numeric expressions of length, price, and the like.
There is a need for a retrieval apparatus that can treat dates and other numeric expressions on the same basis as search conditions not including numeric expressions, and can deal with differences in numeric notation.