The amount of available digital content is enormous and rapidly increasing. Considering the World Wide Web alone, a typical user may access billions of substantially-static Web pages including archival information, as well as live data sources such as news and social networking feeds, microblogging sources, and periodically-updating output from media outlets. Within large enterprises, additional accessible information includes corporate websites, wikis, document repositories, support forums and knowledge bases.
The foregoing circumstances present several challenges to a user. First, the user requires a system for locating particular information within this huge and growing information pool. Conventional search engines provide a fairly effective approach to this challenge in most usage scenarios.
On the other hand, a user may wish to utilize the available digital information to keep abreast of developments related to certain topics. Some systems allow a user to subscribe to “alerts” related to topics of interest, or to build personalized “digital newspapers” which are periodically updated with information relating to topics of interest. However, such systems do not efficiently cope with the vast replication of information across the available information sources. Consequently, these systems either present the user with several content items (e.g., articles) pertaining to the same piece of knowledge (e.g., a news event) but received from different sources, or these systems limit the data sources so that only one content item is received for each piece of knowledge.