The present invention relates to searching a document corpus for documents, and more particularly relates to methods and apparatus for customized filtering of a document corpus search and a set of search result of the document corpus search.
In a typical search system, a user using a client system issues a search query to search a document corpus and receives a set of search results via the client system. The search query may be issued from the client system to a search engine that is configured to search the document corpus, or an index thereof, for content that is relevant to the search query. The search engine may send a summary of the identified content in the form of a set of search results to the client system. The search results might include titles, abstracts, and/or links for the identified pieces of content. The search query and search results may be routed between the client system and the search engine over one or more networks, and by one or more servers coupled to the network.
The network might be a local network, a global internetwork of networks, or a combination of networks. Common local networks in use today include local area networks (LANs), wide area networks (WANs), virtual LANs (VLANs) and the like. One common global internetwork of networks in use today is referred to as the Internet, wherein nodes of the network send the search query to other nodes that might respond with the search results relevant to the search query. One protocol usable for networks that include search systems is the Hypertext Transport Protocol (HTTP), wherein an HTTP client, such as a browser program operating on the client system, issues a query for search results referenced by a Uniform Resource Locator (URL), and an HTTP server responds to the query by sending search results specified by the URL. Of course, while this is a very common example, the issuance of a query and the sending of a set of search results relevant to the query is not so limited.
For example, networks other than the Internet might be used, such as a token ring, a WAP (wireless application protocol) network, an overlay network, a point-to-point network, proprietary networks, etc. Moreover, protocols other than HTTP might be used to request and transport search results, such as SMTP (Simple Mail Transfer Protocol), FTP (File Transfer Protocol), HTTPS (hypertext transfer protocol secure), etc. Further, content might be specified by other than URLs. Portions of the present invention are described with reference to the Internet, but it should be understood that references to the Internet can be substituted with references to variations of the basic concept of the Internet (e.g., intranets, virtual private networks, enclosed TCP/IP networks, etc.), as well as other forms of networks. It should also be understood that the present invention might operate entirely within one computer or one collection of computers, thus obviating the need for a network.
Requested search results that are relevant to a query could be in many forms. For example, some search results might include text, images, video, audio, animation, program code, data structures, etc. The search results may be formatted according to the Hypertext Markup Language (HTML), the Extensible Markup Language (XML), the Standard Generalized Markup Language (SGML) or other language in use at the time.
HTML is a common format used for pages and other content that are supplied from an HTTP server. HTML-formatted content might include links to other HTML content and a collection of content that references other content might be thought of as a document web, hence the name “World Wide Web” or “WWW” given to one example of a collection of HTML-formatted content. As that is a well-known construct, it is used in many examples herein, but it should be understood that unless otherwise specified, the concepts described by these examples are not limited to the WWW, HTML, HTTP, the Internet, etc.
As described briefly above, a set of search results may include abstracts that identify documents that are relevant to a search query. The search results, however, may include a number of results that are not what the user had in mind when formulating a query (e.g., when formulating a query string). To locate the results the user had in mind, the user may review a number of the results, for example, by scrolling through the search results, which may be displayed as a Web page on the client system. If the search results are relatively lengthy, as is common, the user may become frustrated in attempting to locate the results that the user had in mind and might end their review of the search results. Alternatively, the user might issue another search query via their client system in an attempt to locate the search results the user had in mind.
The foregoing described process of issuing a search query and scrolling through search results may be repeated a number of times before a user is presented with a search result the user desires. The repetitive nature formulating and reformulating a query, and of scrolling through numerous sets of search results is essentially a manual filtering process performed by a user, and this process repeated a number of times can be both frustrating and time consuming for the user.
What is needed are an improved search apparatus and an improved search method for generating search results, wherein the search results are automatically filtered to provide the user with search results that are not only relevant to a query, but are also relevant to the user.