The present invention generally relates to ranking web pages, and more particularly, ranking web pages on a very large scale network according to their content.
Web pages or websites are usually page ranked on the basis of the number of incoming URL links, the number of outgoing URL links, and on the authority of these incoming and outgoing URL links. Authority is usually measured by the amount of website traffic or page views for a particular URL. One common page rank scheme ranks web pages on a scale of 1 to 10, with 10 being the highest ranking and 1 being the lowest ranking. Websites such as YAHOO®, GOOGLE® and FACEBOOK® have a 10 ranking because of the large number of incoming and outgoing URL links to these websites, as well as a high authority measure due to the high volume of unique website visitors.
The structure of multiple websites can be depicted graphically as a series of interconnected nodes, with incoming and outgoing links connecting each node. Current page ranking techniques do not take into account the “content” or the subject matter of different website pages. Further, current page ranking techniques only take into account the number of links between pairs of websites, and not the number of links among several websites. This can lead to irrelevant information being returned to a user during a search query. For example, a web page devoted to technology may be returned as a search engine result based on a query related to another topic such as politics. Returning an irrelevant search item could be minimized if the search engine was sensitive to the content of the returned web pages. However, computing a content sensitive page rank across a large scale network with billions of web pages, e.g., the Internet, is computationally intensive in terms of time and processor resources.
Thus, there is a need in the art for a method and system that ranks web pages in a content-sensitive manner across large scale networks such as the Internet.