A technique for searching electronic data and displaying search results is becoming increasingly important due to an increased number of search results due to an increased amount of information to be searched. This is because information sought is buried in a large amount of search results, so that finding the information is becoming difficult. As such a search technique, a technique is being proposed such that a search is executed based on a search condition set according to an analysis of a search request input and the search results are ordered by a unit for calculating predetermined scores, for example.
In such a search technique as described above, for an increased speed of the search, words, etc., are extracted from a document to be searched to create an index and save the created index (see Patent Document 1, for example) in advance. Patent Document 1 discloses a proposed method of obtaining correct search results when documents to be searched are divided into multiple sets of documents and an index is created for the respective multiple sets as the number of documents to be searched increases.
In the above-described technique of calculating the predetermined scores, TF (Term Frequency), which is the number of times a search term, etc., included in the specified search condition appears in the respective documents or is used therein, and DF (Document Frequency), which is the number of documents which includes the search term, etc., are used. Therefore, creating the index as described above makes it possible to complete a search within a short time period.
Moreover, depending on the search condition, documents to be searched may be limited. For example, for searching a patent document, this includes cases such that, in addition to specifying words in the document, classifying information such as the IPC (International Patent Classification) or an FI (File Index) is set. When the classifying information is set in such a manner as described above, the search using the above-mentioned terms is carried out within the scope of simultaneously-specified classifying information, i.e., within the scope of the limited population.
Patent Document 1 JP2007-233752A
Here, for example, a search using terms is carried out using TF and DF as described above. As one of such techniques, a calculation is carried out such that the smaller the DF, the more important the term is handled as being important, and the higher the score. The DF is pre-registered in the above-described index. In the related art, even when a population is limited, scores are calculated using the DF registered in the index.
However, as described above, if the population is limited and the number of documents and images included in the documents to be searched decreases, the frequency of occurrence of information to be searched changes, so that calculating the scores using the DF which is pre-registered in the index could cause an inaccurate score to be calculated.
Moreover, as image information may be converted to a one-dimensional code sequence to calculate the scores using a technique similar to term searching, the above-described problem may become a problem not only for the term-searching DF but also for the image search.