An Internet search engine is a tool typically designed to search for information on the World Wide Web. Users submit search queries to the search engine and the search engine identifies and presents a list of result documents in response to each search query. The list of result documents consist of webpages, images, sounds, and other types of files, typically identified and retrievable by their uniform resource locators (URLs). The list of result documents are typically ranked according to various relevance and quality parameters before being presented to the user.
A web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. In general, the web crawler starts with a list of “seed” URLs. As the web crawler visits each URL, it collects hyperlinks found in the visited page, and adds them to the list of URLs to visit, called the “crawl frontier.” URLs from the “crawl frontier” are recursively visited and sometimes revisited according to a set of crawl policies. The web crawler downloads a copy of all the visited pages for later processing by a search engine. The search engine creates and stores an index of the downloaded pages. The stored index is used to optimize speed and performance in finding relevant documents for a search query.