The Internet contains vast amount of information stored in files located in computing systems all over the world. One of the most significant problems of using the Internet is how to find particular information in this vast network of computers. Search engines are created to address this problem. Search engines allow user to provide a few keywords and retrieve files that match those keywords. Nowadays, a simple keyword search on the Internet may return thousands (even hundreds of thousands) of web pages. To find the desired information, a user may have to scan the outlines of hundreds web pages; a task that virtually no user has the time and patience to perform. One technique that has been developed to decrease the amount of time spent searching for desired information is arranging web pages into categories.
From the point of view of web-page classification, conventional search engines can be grouped into three classes: manual classification, non-classification, or automatic classification. Generally speaking, the current state of the art is that there is one search engine utilizing manual classification, one utilizing automatic classification, and the remaining search engines either provide no classification or only rudimentary manual classification. However, none of the current search engines allow a user to customize the categories to meet the user's individual needs and preferences.
Yahoo (www.yahoo.com) is a popular search engine that manually classifies web pages into subjects (such as, Arts & Humanities, Business & Economy, Computers & Internet, and Education, each of which is further classified into sub-categories, thereby forming a directory structure). The manual classification process usually begins with users who submit suggested subjects for their web sites or web pages. The sites are then placed in categories by people (called Surfers) who visit and evaluate the suggestions and decide where they best belong. By using this manual process, Yahoo ensures the classification is done in the best (humanly) possible way. However, since the manual process is labor intensive and relatively slow compared to the rapid growth of web pages, Yahoo can now only classify a small percentage of web pages (estimated to be less than 10%). This manual process simply cannot keep up with the explosive growth of the web. Thus, the percentage of manually classified web pages is estimated to be getting smaller and smaller.
As mentioned above, most search engines (such as, AltaVista, Excite, Go (formerly Infoseek), DirectHit, Google, and Lycos) do not provide classification of web pages (or only rudimentary manual grouping of a few pages). With the exception of DirectHit, these search engines rank search results based on factors such as the location of the keywords and the number of occurrences of the keywords. For example, if the keywords are located in the title of a web page, then the web page is rated higher than other web pages that contain the same keywords in the body.
DirectHit (www.directhit.com), on the other hand, ranks search results based on the usage history of millions of Internet searchers. This ranking is based on a number of usage factors, such as the number of users who select a web page and the amount of time the users spend at the web page. By presenting the higher ranked pages first, one can see and find the most popular pages or sites.
Northern Light (www.northernlight.com) is one of the first search engines to incorporate automatic web-page classification. Northern Light organizes search results into categories by subject, type, source, and language. The categories are arranged into hierarchical folders much like a directory structure. The arrangements and the choices of the categories are unique to each search and generated based on the results of the search.
The automated categorization of web documents has been investigated for many years. For example, Northern Light received a U.S. Pat. No. 5,924,090 for their classification mechanisms. Mladenic (1998) has investigated the automatic construction of web directories, such as Yahoo. In a similar application, Craven et al. (1998) intend to use first-order inductive learning techniques to automatically populate an ontology of classes and relations of interests to users. Pazzani and Billsus (1997) apply Bayesian classifiers to the creation and revision of user profiles. WebWather (Joachims et al., 1997) performs as a learning apprentice that perceives user's actions when browsing on the Internet, and learns to rate links on the base of current page and the user's interests. For the techniques of construction of web page classifiers, several solutions have been proposed in the literature, such as Bayesian classifiers (Pazzani & Billsus, 1997), decision trees (Apte et al., 1994), adaptations of Rocchio s algorithm to text categorization (Ittner et al., 1995), and k-nearest neighbor (Masand et al., 1992). An empirical comparison of these techniques has been performed by Pazzani and Billsus (1997). The conclusion was that the Bayesian approach leads to performances at least as good as the other approaches.
However, nothing in the prior art systems allows multiple users the freedom to customize or individualize a global Internet or Intranet directory structure of categories and allows the global directory structure to change adaptively based on all of the user's customizations. Such a system would be a significant advance in the art and is disclosed in the following description.