1. Field
This application relates to computer search engines, and more particularly to avoiding masked web page content indexing errors.
2. Description of Related Art
Obtaining useful data parameters for generating search indexes used by search engines has become increasingly important for designers of search engines. Search engines are being used by computer users of all ages and abilities, and endeavor to provide information correctly matched to the users' search requests.
Generally, search engines use corresponding search indexes to obtain search results for these computer users. In turn, search engines use a variety of techniques to obtain data for these search indexes. For example, some search engines automatically generate their listings using software known as “crawlers” or “bots” or “spiders”. Generally speaking, crawlers find and interact with web pages, request the web page from the host for the web page, read the web page, and follow links on each web page to other pages within the web site. The read information may consist of words, terms, network addresses, or other parameters useful for obtaining search results desired by computer users. After obtaining these parameters, crawlers provide their results for indexing in a search index available to the search engine. The search index may include the web pages themselves or summaries of the web pages' content. Finally, search engine software may process the web pages or the summarized content in the search index to retrieve search results and rank the pages according to a specific algorithm.
Other search engines rely upon hosts' descriptions of web pages or web sites to generate listings in the search index. The search engine software searches only for matches in the descriptions submitted by the hosts, which may be prepared by a human operator. In addition, some search engines combine crawler-based search indexes with human-based search indexes to generate hybrid search indices.
All of these methods generate search indexes by reading web pages on the hosts' servers or databases, or by relying upon the hosts' descriptions of the content of their web pages. In either situation, these search engines cannot avoid content errors caused by the hosts themselves. Oftentimes, hosts seek to generate higher ranking scores on popular search engines by responding to a crawler's request with false copies of web pages, or by submitting false descriptions of a web page's content to a human-based search engine. The hosts' actual content is therefore said to be “masked” by misleading information provided in response to a crawler request. Inaccurate indexing caused by hosts providing deliberately inaccurate data about hosted content may be referred to as a masked web page indexing error.
Accordingly, it is desirable to provide methods and systems to avoid these masked web page content indexing errors, thereby generating more useful results for search requests by computer users.