The Worldwide Web (“Web”) is an open-ended digital information repository into which information is posted, with newer articles continually replacing less recent ones or beginning entirely new subjects of discussion. The information on the Web can, and often does, originate from diverse sources, including authors, editors, collaborators, and outside contributors commenting, for instance, through a Web log, or “Blog.” Such diversity suggests a potentially expansive topical index, which, like the underlying information, continuously grows and changes. The diversity also suggests that some of the topics in the index may be more timely, that is, “hot,” than others, which have since turned “cold” over an extended time period or have moved to the periphery of a topic.
Social indexing systems provide information and search services that organize evergreen information according to the topical categories of indexes built by their users. Topically organizing an open-ended information source, like the Web, into an evergreen social index can facilitate information discovery and retrieval, such as described in commonly-assigned U.S. Patent Application, entitled “System and Method for Performing Discovery of Digital Information in a Subject Area,” Ser. No. 12/190,552, filed Aug. 12, 2008, pending, the disclosure of which is incorporated by reference.
Social indexes organize evergreen information by topic. A user defines topics for the social index and organizes the topics into a hierarchy. The user then interacts with the system to build robust models to classify the articles under the topics in the social index. The topic models can be created through example-based training, such as described in Id., or by default training, such as described in commonly-assigned U.S. Patent Application, entitled “System and Method for Providing Default Hierarchical Training for Social Indexing,” Ser. No. 12/360,825, filed Jan. 27, 2009, pending, the disclosure of which is incorporated by reference. Example-based training results in fine-grained topic models generated as finite-state patterns that appropriately match positive training example articles and do not match negative training example articles, while default training forms topic models in a self-guided fashion based on a hierarchical topic tree using both the individual topic labels and their locations within the tree.
In addition, the system can build coarse-grained topic models based on population sizes of characteristic words, such as described in commonly-assigned U.S. Patent No. 8,010,545, issued Aug. 30, 2011, the disclosure of which is incorporated by reference. The coarse-grained topic models are used to recognize whether an article is roughly on topic. Articles that match the fine-grained topic models, yet have statistical word usage far from the norm of the positive training example articles are recognized as “noise” articles. The coarse-grained topic models can also suggest “near misses,” that is, articles that are similar in word usage to the training examples, but which fail to match any of the preferred fine-grained topic models, such as described in commonly-assigned U.S. Provisional Patent Application, entitled “System and Method for Providing Robust Topic Identification in Social Indexes,” Ser. No. 61/115,024, filed Nov. 14, 2008, pending, the disclosure of which is incorporated by reference.
Thus, social indexing systems display articles within a topically-organized subject area according to the fine-grained topics in the social index, which can be selected by a user through a user interface. The topical indexing and search capabilities of these systems help users to quickly access information on topics that they specify. However, these capabilities do not address how best to meet the different information goals of individual users, which can range from focusing on the latest “news,” to catching up on recent topical articles that appeared over a few days, or to reading the most definitive articles on a topic.
The approaches used by online news, social media aggregation, and automated news aggregation Web sites, further described infra, rely on a single ordering of articles, which fails to meet the users' different information goals.