In a previous invention, U.S. Pat. No. 6,360,227, we described a generalized method for automated construction of taxonomies and for automated categorization, or content-based recommendations. A system based on that invention might be used, for example, to construct a taxonomy, or organized set of categories, into which all of the documents on the Web might be categorized without human intervention, or to filter out objectionable categories of data on children's computers. U.S. Pat. No. 6,360,227, issued Mar. 19, 2002, is incorporated herein by reference in entirety for all purposes.
It would be advantageous to have general, semi-automated methods for creating training data for such systems and further refinements in the creation of taxonomies. These new methods make it possible to create taxonomies of very large size that can be used to categorize even highly heterogeneous document collections (such as the World Wide Web) with near-human accuracy.