The Internet is a worldwide, publicly accessible network of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). The “network of networks” consists of millions of smaller domestic, academic, business, and government networks, which together enable various services, such as electronic mail, online chat, file transfer, and the interlinked web pages and other documents of the World Wide Web.
The World Wide Web (commonly shortened to the “Web”) is a system of interlinked nodes accessed via the Internet. With a Web browser, a user may view nodes (e.g. Web pages) that may contain text, images, videos, and other multimedia. More pertinently, a node's content may also contain hyperlinks (or simply “links”) that allow users to navigate from one node to another. A link is a reference or navigation element in the content of a source node that refers to, indicates, or “points” to a target node. In some cases, the target node may be the same as the source node (e.g., a link may target another location within the same node). In other cases, the target node may be a different node or a location within a different node.
Within an HTML document, links are typically specified using the <a> (anchor) elements, although there are many other types of links, including media (e.g. audio or video) embeds, javascript includes, ajax redirects, 301/302 HTTP redirects, meta refreshes, and the like. HTML code may specify some or all of five main characteristics of a link:                target node (e.g. HTML “href” attribute, pointing to a target node);        link label or “anchor text”;        link title (e.g., HTML “title” attribute);        link browser window target (e.g., HTML “target” attribute, indicating which browser window to open the target node in);        link class or link id (e.g., HTML “class” or “id” attributes).        
For example, an HTML link specification may use the HTML element “a” with the attribute “href” and optionally also the attributes “title”, “target”, and “class” or “id”:
<a href=“URL” title=“link title” target=“link target” class=“link class”>anchor text</a>
A vast amount of information is available via the Internet, taking the form of Web pages, images, and other types of files. At the present time, “search engines” offer one of the primary resources used by Internet users to locate information of interest. Due at least to the staggering number of nodes on the Internet, search engines tend to operate largely algorithmically. Oversimplifying, search engines operate generally as follows. The search engine “crawls” the Web, storing data (on the search engine's own server or servers) from nodes it encounters. Using the stored data, a second program, known as an indexer, extracts various information about each node and its content, such as the words it contains and where these words are located. The indexer may also extract information about links contained in the node's content.
In recent years, the Web has grown so large that, as a general rule, creating such indexes involves so many resources that only very large, very well funded corporations have the resources to crawl and index the Web. Indeed, assuming that the average size of the HTML code of nodes on the Web is roughly 25 kilobytes per node, then 3.9 billion nodes (which is a small portion of what would be required to keep an up-to-date index of the Web) represent approximately 90 terabytes of data. Merely storing 90 terabytes of data, let alone processing it, would cost many thousands of dollars at current bulk-storage rates. Accordingly, indexing the Web has largely been the exclusive province of large, well-funded organizations.
Early versions of search algorithms relied on webmaster-provided information such as the keyword “meta” tag, which was intended to provide a guide to each node's content. But using such webmaster-provided metadata to index nodes could be unreliable because the webmaster's account of keywords in the meta tag were often not truly representative of the node's actual content. Inaccurate, incomplete, and inconsistent data in meta tags often caused nodes to rank highly for irrelevant searches. Moreover, Web content providers often manipulated a number of attributes within the HTML source of a node in an attempt to rank well in search results.
By relying so much on factors within a webmaster's control, early search engines were subject to abuse and ranking manipulation. Search engines responded by developing more complex ranking algorithms, taking into account additional factors that were more difficult for webmasters to manipulate. Larry Page and Sergey Brin, who went on to found Google Inc. of Menlo Park, Calif., developed one of the more successful of these complex ranking algorithms. Their algorithm, PageRank, estimates the likelihood that a given node will be reached by a web user who randomly surfs the web and follows links from one node to another. In effect, PageRank treats some links as being stronger than others, as nodes with a higher PageRank are more likely to be reached by the random surfer.
The original PageRank algorithm's rankings were more robust than earlier rankings based on webmaster-provided information. However, webmasters found methods to game the PageRank system, such as by creating “link farms”—or groups of web sites, each of which hyperlinks to every other site in the group—for the purpose of influencing search engine rankings. By 2007, search engines had incorporated a wide range of undisclosed factors in their ranking algorithms to reduce the impact of link manipulation. For example, Google may rank sites using more than 200 different signals. At the current time, the three leading search engines are Google; Yahoo!, operated by Yahoo! Inc. of Sunnyvale, Calif.; and Microsoft's Bing (formerly Live Search), operated by Microsoft Corporation of Redmond, Wash. None of these search engines currently disclose the algorithms they use to rank nodes. Indeed, for most, if not all, major search engines, their index and rankings within that index remain a closely held trade secret.
While such secrecy may have made it harder for unscrupulous Web site operators to manipulate search results, it has also made it difficult for legitimate Web page providers to optimize their pages and sites so that they would organically rank highly in search results. The major search providers provide a few tools of use to Web operators desiring to optimize their rankings, including for example, Google Webmaster Tools, Bing Webmaster Tools, and Yahoo! Site Explorer. However, such tools leave much to be desired. As a general rule, current offerings may measure and publish certain metrics about pages, but not metrics about the individual links pointing to or from a page.