A web crawler is an executable computer program that typically browses the World Wide Web (e.g., the web) in a methodical, automated manner or in an orderly manner. Web crawlers are also sometimes referred to as ants, automatic indexers, crawlers, web robots or bots, web spiders or spiders, or web scutters.
A web crawler generally performs a process that is known as web crawling or spidering. A web crawler is a type of bot, or software agent. In general, a web crawler starts with a list of Uniform Resource Locator addresses (“URLs”) to visit, called the seeds. As the web crawler visits these URLs, the crawler attempts to identify all the hyperlinks in the web page and adds the identified hyperlinks to the list of URLs to visit, called the crawl frontier. URLs from the crawl frontier can be recursively visited according to a set of policies.
Web crawlers are commonly used by search engines that attempt to index the entire or a vast majority of the searchable/publicly accessible sites and web pages available on the World Wide Web. Web crawlers are primarily used to create a copy of all the visited pages for later processing by a search engine that can index the downloaded pages to provide fast searches. For example, search engines (e.g., Google's search engine, Yahoo's search engine, and/or other search engines) can use web crawlers to index web sites for providing up-to-date searchable index data for World Wide Web searches performed using the search engine.