The present invention relates to a method and a system for searching information corresponding to a large amount of documents, particularly to a method and a system for acquiring a final search result by arithmetical operations between search results sorted by different ranking metrics.
As in the case of an Internet search, it is not unusual that search results amount to tens of thousands when information is searched with a keyword from a large amount of documents. To find a document of the user""s interest from such enormous amounts of search results, the following means for a search are known:
changing search conditions by seeing several highly-ranked ones of the search result provided by a search engine and;
sorting them according to a menu provided by a search engine.
The former method includes a method wherein, besides changing a search condition by the user, specifying a condition such as xe2x80x9cessentialxe2x80x9d or xe2x80x9cnot to be includedxe2x80x9d for each keyword displayed by a search engine, or providing a sample document to a search engine to get highly ranked documents results where the contents are similar to the sample document. While such methods are known as relevance feedback, it was not possible for them to cope with designation of relevance from multiple viewpoints such as current as possible and also deeply related to the computer field. The latter menu method includes a search by document type, a search by Web site, etc. However, while this method is also convenient as classification according to document types, it is not effective for a search of Web pages in large quantities.
In addition, there is Reference 1 (Japanese Unexamined Patent Publication No. Hei 10-143530) as a method for combining multiple searching methods by using multiple searching schemes. This Reference 1 is an approach which is close to a method of combining multiple search engines so as to obtain a more relevant search result. However, in the case of a method as in Reference 1, the search result is often in very large quantities due to multiple searching methods. Even if the larger number of correct answers can be acquired by combining search results, it cannot be expected that all of the highly-ranked search results are arranged in the most relevant ordering to a user""s demand. A user must sequentially scan in order to find truly necessary data from such enormous search results. In the case of combining search results, the process will be under huge load since it is necessary to always seek search results of the entire database and logically combine them.
Moreover, the conventional searching technologies do not provide the means for meeting the following demands which are quite natural:
sorting the first several tens or hundreds of elements of the subject; and data set in a specific ordering or in various orderings; and
arranging further in order of decreasing relevance, since the amount is too large to check all the elements even after possible narrowing.
An object of the present invention is to provide a searching method, computer program product, and system for sorting a specific collection of documents in various orderings.
Another object is to provide a searching method, computer program product, and system for defining a new ranking metric by composing multiple ranking metrics to provide a user with highly relevant search results.
A further object is to provide a searching method, computer program product, and system for additionally specifying the most suitable arrangement by interactively combining ranking metrics.
A still further object is to provide a searching method, computer program product, and system for presenting a specific search result by sorting them in various orderings.
A still further object is to provide a searching method, computer program product, and system wherein, the larger the number of collected data is, the larger the likelihood of finding important data by sorting grows.
A still further object is to provide a searching method and a system of high practicality and scalability which only requires sorting of the search results acquired by the first search.
To attain the above objects, multiple rankings (weightings) are used when information is searched from a large amount of data (documents). Thus, in the case that relevant data does not gather in higher-ranking positions in a single ranking/ordering, relevant data originally in low ranks can be more easily discovered by ranking metrics from different viewpoints. Moreover, the sum, difference, intersection, etc. of the ones ranked high in more than one rankings are acquired. Thus, a means for collecting important data in higher-ranking positions is provided.
And more specifically, in searching documents related to prescribed information from a collection of documents, it is organized to sort the collection of documents with multiple ranking metrics; determine a new collection of documents in higher-ranking positions of the sorted collections of documents; perform arithmetical operation between the new collections of documents in higher-ranking positions; and determine documents in higher-ranking positions of a result of the arithmetical operation as a search result. To xe2x80x9crankxe2x80x9d and to xe2x80x9csortxe2x80x9d (by ranking metrics, relevance or weight) are used herein as the same meaning.
By way of example, xe2x80x9cmultiple ranking metricsxe2x80x9d includes, but is not limited to: date and time of document publication; document size; frequency of document update; number of links included in a document; extent of inclusion of terminology related to prescribed information; number of keywords related to prescribed information; etc.
By way of example, xe2x80x9carithmetical operationxe2x80x9d includes, but is not limited to: the sum of the collections of documents; the intersection of the collections of documents; or the difference between higher-ranking positions of the sorted collections of documents.
Namely, the final search results are acquired by performing arithmetical operation among specific (with fixed search results) collections of documents sorted in various orderings and not by narrowing the search results to gradually reduce the candidates. It also makes it possible to additionally specify the most suitable arrangement of search results by interactively composing such ranking metrics.
Thus, it becomes possible, by utilizing multiple ranking metrics, to rank a search result of a large amount of data so that highly relevant data from a viewpoint prescribed by a user gathers in higher-ranking positions. In the present invention, as ranking metrics for arranging data other than ranking metrics (order of relevance to a query) provided as standard, various orderings provide ranking metrics as above, such as an ordering of the date of document publication, an ordering of size of each data, decreasing order of frequency of document update, decreasing order of number of links included in a document, decreasing order of frequency of inclusion or decreasing order of number of terminology included in a specific field, or decreasing order of number of specified keywords.
The present invention is not only applicable to all the Internet search engines but also available for display of database records in a flexibly ordered manner, so it provides a very effective searching technique as a front end for information searching in general. While it is difficult, by ranking metrics through a mere search with keywords and a degree of their matching, to completely eliminate unnecessary documents adequately containing keywords (SPAM), it becomes easier to find really desirable documents by using the method of the present invention. Moreover, the present invention is capable of not only providing a powerful means for enhancing and differentiating a search engine but also becoming a promising tool for dramatically improving information searching capability of a search engine accumulating data in large quantities.