Web crawlers based on a keyword search are the basis for acquiring keyword information. However, information update would cause cases of repeated or incomplete crawls of web crawlers, especially in target websites where information updates rather frequently, e.g., microblog websites such as Sina Microblog and search websites such as Baidu, etc. For popular keywords, it would be difficult for web crawlers to get a complete crawl of data therefor due to rather rapid information update, whereas for unpopular keywords, information is crawled repeatedly due to relatively slow information update.
In existing practice, different popular levels are set for keywords, and then keywords are crawled according to the popular levels thereof so that keywords with higher popular levels are crawled more frequently.
However, the existing practice has the following defects: (1) it is necessary to acquire the popular level for each keyword and then set the crawl frequency according to the popular level; and (2) many requests for secondary download link addresses are involved in an initial request process, and the existing scheme does not distinguish them.