Active learning concerns techniques for incorporating user input or feedback to improve the accuracy of models developed by learning-capable algorithms. Most research in active learning has focused on improving techniques for selecting the next example to be used to prompt feedback from a user. That is, most research has been concerned with prompting the user for input that will most improve the accuracy of the model produced by the algorithm. However, this is often provided at the cost of increased time between iterations of active learning feedback, which in practice results in the user spending a prohibitive amount of time waiting for the system between iterations. The long wait times between feedback iterations result because the model (which tends to be highly complex and input-dependent) must be updated each time the user's input is provided. Indeed, the resulting wait times between iterations become so great as to make a practical system very difficult to achieve.
A practical active learning system would be of great benefit to a variety of tasks, including classification tasks. A classification task of particular interest is the extraction of attribute-value pairs from natural language documents that describe various products. Various techniques for performing such attribute-value extraction are described in our prior U.S. patent application Ser. No. 11/742,215 (the “'215 application”) and/or U.S. patent application Ser. No. 11/742,244 (the “'244 application”), the teachings of which prior applications are incorporated herein by this reference. As noted therein, retailers have been collecting a growing amount of sales data containing customer information and related transactions. These data warehouses also contain product information that is often very sparse and limited. Treating products as atomic entities hinders the effectiveness of many applications for which businesses currently use transactional data, for such as product recommendation, demand forecasting, assortment optimization, and assortment comparison. While many retailers have recently realized this and are working towards enriching product databases with attribute-value pairs, the work is currently done completely manually, e.g., through inspection of product descriptions that are available in an internal database or through publicly available channels (such as the World Wide Web), or by looking at the actual product packaging in a retail environment. While our prior U.S. patent applications describe techniques that beneficially automate these tasks, the techniques described therein could be further improved through use of active learning, i.e., through the limited use of expert feedback. To this end, it would be particularly advantageous to provide techniques that allow active learning to be incorporated into classification tasks, such as that described above, without the prohibitive lag times between feedback iterations.