In general, only a small portion of the text on a web page may have any relevance in indicating the overall content of that web page. This small portion of meaningful text may be surrounded by headers, footers, navigation elements, advertisements, and other irrelevant text. Though this irrelevant text may be useful to a human reader, it may be problematic for search engines attempting to sift through large amounts of web pages to find those web pages that may be the most relevant. More specifically, the existence of irrelevant text in a web page may increase the likelihood that a search engine will return irrelevant web pages. For example, a query for the term “business” may match an irrelevant web page from www.nytimes.com because it contains the term “business” in a navigation element.
Such false positives may lead to a decrease in the search quality of a search engine. Furthermore, these false positives may lead to inferior user experience with a search engine.