1. Field of the Invention
The present invention relates to a method, system, and program for searching for file addresses, e.g., URLs, and ordering the search results using a popularity weighting indicating the frequency of selection of the URL from returned search results.
2. Description of the Related Art
To locate documents on the Internet, users typically use an Internet search engine. The user would enter one or more key words and perhaps indicate boolean operators for the search, and transmit the search request to a server including a search engine. Search engines include a spider program or crawler that periodically visits web pages and searches the Internet to locate new web pages and revise previously located sites to look for changes. The spider then places information from the pages it locates into a database index which relates URLs to search terms.
Search engines can index various information from the located pages to associate with the located URL. Many search engines index the full body of visible text, but may exclude commonly used words, e.g., xe2x80x9cthexe2x80x9d, xe2x80x9candxe2x80x9d, etc. Search engines may also index keywords included in a special keyword meta tag in the document that holds key words the page designer designates to use for searching purposes. Search engines may include alternative text associated with images and perform word stemming to include variations of a word, e.g., politics, politician, political, etc. as keywords to include in the index. The keywords indexed for a particular URL are then searched when a user enters the keywords for a search. The results of a search include all URLs having indexed words that match the search term and any specified boolean search operators.
A search engine may locate numerous search results in response to a user search query, many of which may not be relevant. One problem search engine developers must address is the order in which to present the search results. Most search engines use the location and frequency of keywords on a web page as the basis of ranking search results. Other search engines may boost a pages display order if search keywords are included in the meta description and keywords tag of the page. A search engine can also provide a relevancy boost based on the number of pages and/or number of important web pages that include hypertext links to the search result page.
The response to a query is often determined by how keywords are indexed for a URL. In author-controlled search engines, the search engine providers allow the document author to designate the indexed keywords by specifying such words in the document. Other search engine providers use a editor-controlled approach, where the search engine provider employs editors to manually catalog web sites and the indexed keywords that will be used in searching. Author oriented search engines provide document authors the opportunity to include misleading words in the meta tags to cause the search engine to return the document in response to searches unrelated to the document content. Editor-controlled search engines can result in irrelevant search results if the editors associate the wrong keywords with the URL or exclude highly relevant key words from the URL/key word association.
To improve how results are presented, one search engine company, Direct Hit, has developed algorithms to rank search results according to the popularity of the site. The Direct Hit search engine anonymously monitors which web sites Internet searchers select from the search results list, how much time the searchers spend at these sites and other metrics. The sites that are selected by searchers are boosted in their ranking, while the sites that are consistently ignored by searchers are penalized in their rankings.
There is a need in the art for an improved method, system, program, and data structures for incorporating popularity of URL selection into the order in which search results are returned and displayed to the searcher.
To overcome the limitations in the prior art described above, preferred embodiments disclose a method, system, program, and data structures for ordering electronic files subject to searching. At least one keyword is associated with each file. A physical location of each file is identified by a file address. A popularity weight is associated with at least one file address and key word pair such that a file address is capable of having multiple associated keywords and one associated popularity weight for each file address and keyword pair. In response to executing a search query including search keywords, file address search results are received that have at least one associated keyword that matches at least one search keyword in response to executing the search query. The search results are ordered according to the popularity weight associated with each file address search result and keyword pair whose keyword matches the search keyword. A document is then coded to include the file address search results such that the document will display the file address search results according to the ordering.
In further embodiments, a request is received to access at least one of the file address search results displayed in the document. The popularity weights are adjusted upward for the at least one requested file address search result and keyword pair matching the search keywords. Still further, the popularity weights may be adjusted downward for those file address and keyword pairs matching the search keywords that were not requested.
In Internet embodiments, a server having a server URL executes the search and generates the document. In such case, the file address search results comprise search result URLs denoting the location of files distributed at servers over an Internet. The document is generated by combining the server URL with each search result URL into a combined URL for each search result. The document is then coded to display each search result with the combined URL such that issuing a request to the combined URL is directed to the server. A request to the combined URL is received and the popularity weight associated with each search result URL in the combined URL and keyword pair matching the search keywords is adjusted upward. The request is directed to the search result URL.
Preferred embodiments provide a method, system, program, and data structures for associating popularity weights indicating the frequency of selection, i.e., popularity, of a particular URL from returned search results. In preferred embodiments, a popularity weight may be associated with each URL and keyword pair to provide keyword specific popularity weightings. The popularity weights are used to determine the order in which search results are presented and displayed to the searcher. One advantage of keyword specific popularity weighting is that the ordering of the presentation of a particular URL in a returned search is based on the popularity associated with the search keyword, not the URL in general. In fact, a URL may have widely different popularity weightings for different keywords, indicating the relevance of the URL to the different search keywords. In this way, preferred embodiments provide a fine grained ordering based on a measured popularity that is specific to the search keywords and incorporates a user ranking of the importance of the URL to specific keywords.