Not applicable
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates generally to field of Internet Search Engines, Web browsers, and resource gathering and has special application in situations where these functions must be implemented in extremely large networks.
2. Description of the Related Art
The World-Wide-Web (xe2x80x9cWebxe2x80x9d) has become immensely popular largely because of the ease of finding information and the user-friendliness of today""s browsers. A feature known as hypertext allows a user to access information from one Web page to another by simply pointing (using a pointing device such as a mouse) at the hypertext and clicking. Another feature that makes the Web attractive is having the ability to process the information (or content) in remote Web pages without the requirement of having a specialized application program for each kind of content accessed. Thus, the same content is viewed across different platforms. Browser technology has evolved to enable the running of applications that manipulate this content across platforms.
The Web relies on an application protocol called HTML (Hyper-Text Mark Up Language), which is an interpretative scripting language, for rendering text, graphics, images, audio, real-time video, and other types of content on a Web compliant browser. HTML is independent of client operating systems. Therefore, HTML renders the same content across a wide variety of software and hardware operating platforms. The software platforms include without limitation Windows 3.1, Windows NT, Apple""s Copeland and Macintosh, and IBM""s AIX and OS/2, and HP Unix. Popular compliant Web-Browsers include without limitation Microsoft""s Internet Explorer, Netscape Navigator, Lynx, and Mosaic. The browser interprets links to files, images, sound clips, and other types of content through the use of hypertext links.
A Web site is a related collection of Web files that includes a beginning file called a home page. A Web site is located a specific URL (Uniform Resource Locator). Web site usually start with a home page from which a user can link to other pages. Online URL http://www.ibm.com is one example of a home page.
Users of the Web use tools to help find, location or navigate through the Web. These tools are known as Internet search engines or simply search engines. Almost all search engines provide graphical user interfaces (GUIs) for boolean and other advanced search techniques from their private catalog or database of Web sites. The technology used to build the catalog changes from site to site. The use of search engines for keyword searches over an indexed list of documents is a popular solution to the problem of finding a small set of relevant documents in a large, diverse corpus. On the Internet, for example, most search engines provide a keyword search interface to enable their users to quickly scan the vast array of known documents on the Web for the handful of documents which are most relevant to the user""s interest.
There are several examples of search engines including tools called Internet search engines or simple search engines Yahoo (http://www.yahoo.com), AltaVista (http://www.altavista.com), HotBot (www.hotbot.com), Infoseek (http://www.infoseek.com), Lycos (http://www.lycos.com) WebCrawler (www.webcrawler.com) and others. The results of a search are displayed to a user in a hierarchically-structured subject directory. Some search engines give special weighting to words or keywords: (I) in the title; (ii) in subject descriptions; (iii) listed in HTML META tags, (iv) in the position first on a page; and (iv) by counting the number of occurrences or recurrences (up to a limit) of a word on a page. Because each of the search engines uses a somewhat different indexing and retrieval scheme, which is likely to be treated as proprietary information. Refer to online URL http://www.whatis.com for more information on search engines.
In its simplest form, the input to keyword searches in a search engine is a string of text that represents all the keywords separated by spaces. When the xe2x80x9csearchxe2x80x9d button is selected by the user, the search engine finds all the documents which match all the keywords and returns the total number that match, along with brief summaries of a few such documents. There are variations on this theme that allow for more complex boolean search expressions.
The problem present with the prior art is the inherent difficulty for web crawlers to adequately search, process, rank, and sort the vast amounts of information available on the Internet. This information content is increasing at an exponential rate, making traditional search engines inadequate when performing many types of searches.
At least one metadata search system (xe2x80x9cDirect Hitxe2x80x9d www.directhit.com) determines the most popular and relevant sites for a given Internet search request based on the number of direct hits that the site receives. However, these systems simply sort the results of the search based on the hits to those results (their hit count is simply a raw hit countxe2x80x94not associated with the original search query). Accordingly, a need exists to provide a system and a method to associate search results with a specific search query string.
As stated previously, with the volume of data available on the Internet increasing at an exponential rate, the search effort required to obtain meaningful results on the Internet is also increasing exponentially, thus triggering a need for more efficient search methodologies. Accordingly, a need exists to provide a system and method to permit improvement in the search ranking efficiency of current web search engines.
General Advantages
The present invention typically provides the following benefits:
Time Savings. Reading through the abstracts of a result page is a time consuming task. The sorting mechanism of the present invention brings the most popular resources for a particular query to the top of the list of the result page. Because users usually start from the beginning of a list, they save time reading abstracts. The popular ones might already be the best fit for their query and they can stop evaluating and reading more abstracts of the result page.
Leveraging Human Interaction. The resources are usually sorted by relevance (matching the original query string). Indexing is done mostly automatically. The present invention uses the human""s ability to evaluate resources and store this information for further reuse. Users choose to access result items (by clicking on a hyperlink usually) after they evaluated the abstract of a result item and think that this could be a good match (for the query they issued before). This human knowledge is automatically collected and can then be reused by other users. Therefore, resources that are more often reviewed and visited will have a higher ranking. Thus, the search quality is improved by integrating human evaluation capabilities.
One skilled in the art will realize that these advantages may be present in some embodiments and not in others, as well as noting that other advantages may exist in the present invention that are not specifically listed above.
Briefly, in accordance with the present invention, a method for presenting to an end-user the intermediate matching search results of a keyword search in an index list of information. The method comprising the steps of: coupling to a search engine a graphical user interface for accepting keyword search terms for searching an indexed list of information with a search engine; receiving one or more keyword search terms with one or more separation characters separating there between; performing a keyword search with the one or more keyword search terms received when a separation character is received; and presenting the number of documents matching the keyword search terms to the end-user. presenting a graphical menu item on a display.
In accordance with another embodiment of the present invention, an information processing system and computer readable storage medium is disclosed for carrying out the above method.
The present invention incorporates a document relevance measure that accounts for the change in Web content and therefore improves the quality of results returned to users. Three measures are combined when calculating the overall document relevance: (a) content relevance (e.g. matching of query search terms to words in document), (b) version-adjusted popularity (e.g. number of accesses to each version of the document), and (c) recency (e.g. age and update frequency of a document). With this information the present invention provides a ranking system that performs a ranking based on a combination of relevancy and popularity.
An overall example of this present invention is now described. User Z is looking for a particular and efficient Quicksort algorithm. He/She uses search engines with enhanced features to construct a complex query. The result page contains 100 external resources (URLs), which contain hyperlinks to various implementations of the search features of the present invention. User Z now begins to read through the abstracts provided and eventually chooses one result item for closer examination. Thus, User Z selects a hyperlink pointing to the external resource. Typically the document is downloaded into a viewing device (e.g. a web browser) and then User Z is able to further examine the whole document. When User Z is done with reviewing the document, he/she might also select other links to resources on the result pages for further review, which look promising. The present invention examines the user""s behavior by monitoring all the hyperlinks the user clicks on. Every time the user clicks on a hyperlink on a result page, the present invention associates this particular resource with User Z""s original search query and store this information ( less than user query, URL greater than  pair) in a database system.
User Y later uses the system independently using the search features of the current invention and enters the same query using the same search engine features as User Z. The present invention forwards the request to the search engine, which retrieves the matching resources. However, before returning the matching resources to User Y, the present invention checks to see if these resources were chosen by User Z (which issued a similar query). If a resource was chosen by another user (e.g., User Z) that issued a similar query then a popularity vector is calculated. All resources are then sorted by popularity first, then by relevance, and then returned to User Y. Note that User Y""s result page now contains result items first that were chosen by User Z (who performed a similar query). In summary, the present invention stores the original query of the user and associates its further resource selection to this query.