Web crawler is an internet bot that browses through the World Wide Web (WWW) for indexing one or more web sites. Some of the web crawlers browse through websites having specific content (e.g., healthcare, technology, etc.). Such web crawlers are usually referred as focused web crawlers, or focused crawlers.
Typically, the focused web crawlers are trained on a sample or seed websites prior to browsing through the WWW. During actual crawling through the WWW, the focused web crawler utilizes the content learnt from the sample or seed websites to browse through the WWW. As the web crawler only utilizes the content learnt during the training phase, therefore the scope of browsing of the focused web crawler may be limited. This may lead to a decline in the efficiency of the overall crawling process.