Within an enterprise, search engines often have difficulty finding the most relevant pages for a given query. External search engines make use of algorithms such as the much publicised Google Page Rank algorithm. This is described in U.S. Pat. No. 6,285,999 (and subsequent patents, available on the Worldwide Web at en.wikipedia.org/wiki/Page Rank). Unfortunately this algorithm works poorly when used to search a company's Intranet because of the low number of incoming links. On the Internet, if a page is seen as useful, then typically lots of people will link to that page. However with a company's Intranet relevant and less-relevant pages alike are likely to have the same low number of inbound links each. This means that algorithms tuned for the Internet will make decisions based on information that has little significance in an Intranet environment. Further, pure keyword frequency may be an insufficient approach. Use of keyword frequency is described in US 2005/0114322 A1.
A large source of dissatisfaction with results is their irrelevancy to the user performing the query. U.S. Pat. No. 7,599,917 discloses a solution which determines a document relevance score for documents on a network. The document relevance score is calculated using a ranking function that contains one or more query-dependent components as well as one or more query-independent components. The query-independent functionality allows an administrator or the system to identify authoritative (important) documents in the network. The query-dependent data or content-related portion of the ranking document depends on the actual search terms and the content of the given document.
Hyperlink-Induced Topic Search (HITS) is an algorithm also known for ranking web pages (en.wikipedia.org/wiki/HITS algorithm). HITS is used to identify hubs (those pages that have a certain number of outgoing links) and authorities (those pages to which others refer).
The prior art solutions disclosed do not address the problem identified by the following example:
A user searches for “University Relations” from the search bar on UK based company X's Homepage. Based on keyword counts, the US and Brazil University Relations pages currently rank more highly than the UK pages. In the solution described above, an administrator or the system itself may have identified important pages within the network and this may also make a difference to the ranking of pages. What is needed however is for the UK University Relations Program to be ranked more highly than the US and Brazil programs, because the search was carried out by a user based in London.
Current attempts at providing a solution involve geographic knowledge about users' IP addresses, and language information in the pages and page metadata. However, in this example, the page resides on a server in Germany, and so may even be marked down despite being the best page for the user.