1. Field of the Invention
The present invention generally relates to data processing and utilization of data processing systems to locate desired data. More particularly, the present invention relates to methods and systems for locating electronic documents.
2. Description of the Related Art
The World Wide Web (i.e. the Web) denotes a vast set of interlinked documents (i.e., Web pages) residing on various data processing systems around the globe. In recent years, the Web has experienced rapid growth, to the point that the Web now contains millions of documents. The data processing systems that serve up these documents on request are called servers, and when a data processing system is utilized to retrieve a document from a server, the retrieving data processing system is considered a client.
In general, the interlinked documents are publicly accessible and are retrieved using the communications protocols known as “HTTP” (which stands for Hypertext Transfer Protocol) and “TCP/IP” (which stands for Transmission Control Protocol/Internet Protocol). The servers, communications networks and related facilities that provide access to the documents of the Web are known collectively as the Internet.
In addition to Web documents, a number of services are also available via the Internet, including search engines, which help users to identify which of the millions of Web documents relate to particular subjects of interest. Typically, a search engine includes a Web page that serves as a user interface through which a user enters a search expression, a database that associates Web page addresses with Web page content, and a comparator that determines which of the Web pages in the database include content corresponding to the entered search expression. The addresses of the corresponding Web pages are returned in what is called a “hit list.” For example, if a user were to enter a search expression consisting of a particular word, the resulting hit list would provide the addresses of Web pages containing that word.
However, search expressions utilizing a list of words relating to a subject often cause search engines to produce inefficient hit lists (i.e., hit lists that include unhelpful sites and/or that fail to include a reasonably large number of helpful sites). For instance, a user wanting to identify Web pages with substantive content concerning World War II might enter the search expression “World War II.” The search engine would then return a hit list of Web pages containing the entered words. In addition to the hits with the desired substantive content, however, the hit list will likely also contain hits with no substantive content relating to the subject in question, such as hits identifying Web pages with mere advertisements for books on the subject. Unless one is looking for a book, the hits relating to mere book advertisements get in the way because they show up in the hit list but generally do not answer any substantive questions or provide any significant amount of substantive information regarding the subject of interest. In addition, due to the large number of Web pages now in existence, overbroad hit lists often identify substantially more Web pages than a user can conveniently explore.
Obtaining efficient hit lists is one of the biggest challenges associated with utilizing the Web. To address this challenge, many search engines allow users to enter searches, known as “Boolean searches,” that are more complex than a simple list of words. In a Boolean search, the user enters Boolean operators along with the words of the search expression. Among the most common Boolean operators is AND, OR, and NOT. Furthermore, according to the syntax utilized by some search engines, AND, OR, and NOT may be abbreviated as &, I, and I, respectively. Also, OR is generally the default operator (which means that a search expression containing words but no explicit Boolean operators is interpreted as if those words were joined with the OR operator). Quotation marks also act as Boolean operators, allowing the user to group words into a phrase. Such a phrase produces a match only when that same phrase (i.e., all of the words in the same arrangement) is found in a Web page.
Some search engines also support “include” and “exclude” Boolean operators, which may be entered as + and −, respectively. If a word is qualified with the include operator, a document is a match only if the document includes that word. If a word is qualified with the exclude operator, a document is a match only if the document does not include that word. In addition, parentheses may be utilized to group pieces of a search expression together, for instance, to associate an include operator with one group of words but not another.
By utilizing Boolean expressions, skilled database searchers are able to obtain more efficient hit lists. However, substantial effort may be required to formulate and enter a search expression that is sufficiently complete to obtain a reasonably efficient hit list. Furthermore, the user is unable to specify, prioritize and control the order of the resulting hit list at the front end of the search (e.g., as part of the search expression). Therefore, there exists a need for a more effective way to generate efficient hit lists. Particularly, there exists a need for methods and systems for locating electronic documents which allow a user to specify, prioritize and control the order of the resulting hit list at the front end of the search.