The retrieval accuracy of a search engine can be greatly improved when the engine is constantly updated with cues harvested from the usage data. However, the quality of the usage data is highly dependent upon the search engine that is used to collect such data. Therefore it is desired to have a “bootstrap” engine with reasonable quality when venturing into a new domain.
A typical approach to ensure good bootstrap quality is to train the engine with exemplars. The exemplars oftentimes have to be collected and labeled by humans, a process that is time consuming and expensive especially when a large quantity is oftentimes necessary to have a well trained engine. The exemplars can become stale, especially for domains on the web where hot trends and new contents emerge with rapid pace.