Classifiers have been used for a variety of information retrieval tasks including routing, web junk, spam identification, accelerated searching, and filtering. However, to deploy classification technology within a larger retrieval system may be resource intensive. Analyzing large quantities of information to make a determination regarding the content or disposition of the content may be balanced against the cost, quality, and/or time of the operation. Costs may be related to aggregating, querying, organizing, sampling, and making determinations of data associated with an overall population. The certainty or confidence in the determinations made about the information may be an indication of quality. Also, making the determinations in a timely manner may also be important.
Making determinations regarding a large quantity of information may be established through the use of statistics that may provide an estimate or determination of a characteristic of the information. For example, a web classifier may provide information related to the type of content that may be found on a web page. Millions of web pages may share similar content characteristics within varying degrees, such that certain web pages may be more strongly related to a topic of interest than other web pages. The cost of analyzing millions of web pages by their classifiers may be too high or time prohibitive. On the other hand, analyzing a smaller number of web pages may result in lower certainty or confidence in the determination or estimate. Accordingly, a person interested in making estimates about large quantities of data would prefer to be able to make high quality estimates with as few queries or samples as possible in a short amount of time. Certain types of large scale estimations also require direct human intervention in order to direct the sampling process and to verify whether or not each sample has a certain characteristic or property.