RSS has been used for over 14 years for helping to distribute web content. By RSS, we refer to RSS, Atom and Web feeds generally. While the apparent significance of RSS has declined in some recent years, possibly in response to social media, the functionality that RSS provides has been baked into the Web and a majority of popular websites offer subscriptions via RSS.
Text analytics tools can be and have been applied to Web content, and by extension, to web feeds for over 10 years. One system of note from 2004, “Monkey News,” effectively aggregated content from a number of RSS feeds, while identifying grouping “stories” into “topics” in a fully-automated way based on computer text analysis. Numerous U.S. patents, including U.S. Pat. No. 6,477,524 disclose methods for comparing stories (text) for relatedness.
The prior art has shown that a collection of documents can be clustered by using a range of methods. In one such method, spherical k-means, each document is tokenized and the tokens (words) can be stemmed to reduce dimensionality and counted. Stop words can be removed if desirable. A TF-IDF (term frequency—inverse document frequency) value can be computed for each token to deemphasize terms that appear often. The resulting vector of tokens and values can represent the document and documents can be compared to each other using cosine similarity of the representative vectors. When using k-means, an initial set of k clusters is created (represented by a centroid within the vector space of the documents) and the documents each placed into the cluster nearest to the document (via one of many known distance functions including cosine similarity) before recomputing the cluster centroids based on the mean (average) value of the vector components in the documents contained by the cluster. This process is repeated until some condition is satisfied.
A system that accommodates multiple users may be able to suggest additional documents and sources (for which a single user may not be subscribed) that also fit into the topics that are represented in a user's feed, whether or not the feed appears as a cluster. Before discontinuation in 2013, Google Reader provided related documents and related sources. However, Reader and similar systems all failed to cluster existing documents with related documents from feeds and bookmarks. In related prior-art, U.S. application Ser. No. 13/761106 discloses a method for display social media links related to items in a news feed. This efforts too is distinct from the present invention in that it provides no benefit from user bookmarks and does not explicitly target the organization skills of a variety of experts.