User generated content (UGC) such as photos, videos, comments, status updates and shared links dominates the latest evolution of the Internet. Yahoo! and other internet companies receive large amounts of user generated content but can display only a few of them in a page of context, along with pagination support for showing more posts. The content that is shown is primarily time-driven (most recent) or community ratings driven (number of thumbs-up). In some circumstances, a complex off-line batch job computes the most popular comments and displays them.
UGC plays a key role in increasing engagement and provides an avenue of participation for end users. Users of Yahoo! post millions of comments every week. Top stories in Yahoo! routinely generate 10K+ comments regularly. The March 2011 Japan tsunami story alone generated 100 k comments. Unique visitors to Yahoo! News site number more than 10 million daily. A large fraction of these visitors (estimates as large as 99%) read the comments.
The default view for displaying UGC is chronologically. This method results in either static content on the first page (oldest first) or too much spam/low-quality posts on the first page (newest first). Other sort orders like “highest rated by community” suffer from some deficiencies and first mover advantage. The re-use of slug-ids (on-line content topics) by news, which are used as identifiers for the commenting widget, results in comments from old, and sometimes unrelated, articles to be shown with newer ones.
For example, Reuters, the news agency, continues updating the same article slug-id, “obama_healthcare” with the latest developments. Comments from the original version of the article, when attached to the latest article, appear irrelevant and even appear to contradict the story. The consequence of this is a bad end user experience. To address this issue and that of user-gratification (of seeing their latest comment in the first page) the default sort order was changed to reverse chronological. Though contributions increased, it resulted in a severe drop of comment quality, with lots of spam, trolls, and off-topic comments. Additionally it put additional strain on customer care; not to mention a qualitative adverse effect on search engine ranking.
There is a need for a ranking method of presenting user generated comments that balances quality and freshness without increasing performance overhead.