As the Internet and electronic devices have become ubiquitous, an extremely large number of documents are generated every day, such as blogs, comments, news articles, customer reviews of products, etc. For example, WORDPRESS.COM, owned by AUTOMATTIC INC. of San Francisco, Calif., receives 347 user published blogs every minute, and AMAZON.COM, owned by AMAZON.COM INC. of Seattle, Wash., receives around three-hundred thousand customer reviews of products every day. Many of these documents contain useful information. For example, news articles keep readers informed of the events occurring around the world, while customer reviews of products are not only helpful to customers when making purchase decisions, but can also be helpful to stakeholders such as authors, sellers, product managers, manufacturers in order to analyze and improve products.
The overwhelming number of documents are, however, challenging to analyze. For example, for certain products it might impossible for stakeholders to manually examine all of the product reviews of their products. Tools are therefore needed for automatically analyzing the content of such documents.
Existing tools for analyzing contents are, however, primitive. Most of these tools are based on filtering and sorting. For example, stakeholders can filter customer reviews of their products based on star ratings or particular keywords, or sort customer reviews based on submission dates. These basic functions provide only limited assistance to content analysis. More critically, in order to benefit from these functions, the stakeholders often need to know in advance the topics of the content that they are interested in. This would require the stakeholders to either review the customer reviews at least once to find relevant topics in the content or rely on their previous knowledge about the topics, which might result in missing important topics that were not previously known to the stakeholders.
The disclosure made herein is presented with respect to these and other considerations.