Search engines provide a convenient way to access information. However, when a user submits the same search request to different search engines, the user may get different search results, due to the different sets of information collected and indexed by different search engines and/or due to the different ways the search engines are configured to identify search results.
Search engines may use software tools to visit various web servers automatically to identify information existing on the web and index the information for subsequent searches. For example, a search engine may crawl the web to retrieve web page content to then index web content to power user searches. A search engine may also maintain a cached copy of web pages to serve when the original site is not available.
Search engines may also obtain content or data from other sources, such as data submitted directly to the search engines, such as business listings, advertisements, airline flight directories of schedules and fares, etc. These search engines are typically specialized, to local search, as a merchant site, or to travel.
Web scraping generally includes activities to extract data or content from a website through manual or automated processes. The extracted data may be used in various ways, including indexing the website to facilitate searching, using the extracted data to run a separate website, or to power a separate application, etc. For example, travel fares available on the websites of individual airlines may be available on other websites that aggregate travel websites.
Generally, a website expects and welcomes visits by automated tools, including web crawlers, as well by individual non-automated users. The web depends on this activity to make a set of web sites into a network of discoverable sites. Further, both businesses and individuals, as part of usual competitive information gathering, will visit a number of other websites to help inform an understanding of the differences in customer experience.
Not welcome, however, is excessive numbers of robotic-powered requests, or systematic activities to extract all or almost all of the content of the web site, which may be the business or personal property of the web site owner, especially when that activity is for direct financial gain. For example, a scraper may use the extracted data to set up a scraper site, which serves its users using the data extracted through web scraping without referring the users to the original website. Also, web scraping may overload a website, causing degradation in response performance for regular users of the website.