The amount of content or data or items available on the Internet (e.g., time sensitive documents such as blogs, forum posts and the like) continues to increase exponentially. Users with limited information and limited time have difficulty in finding items that satisfy their interests. Thus, several recommendation systems (e.g., text mining systems and information retrieval systems (IR)) are used widely in the art to recommend appropriate items to users based on their inclinations and preferences. A typical way for presenting output of the IR system is by means of listing the documents and sometimes their scores of relevancy. Another popular way to present the output of text mining systems is through tag clouds. Tag clouds are used to present the relevance of items (e.g., text items) in a collection of documents, where relevant text items appear in a dedicated area where relevance is emphasized usually by size and color.
Currently, the text mining systems determine importance or significance of text items using standard Term Frequency-Inverse Document Frequency (tf-idf) techniques and the like. However, one of the challenges in implementing the standard tf-idf technique is that the idf part at a particular small sub collection (e.g., documents pertaining to a week in a yearly corpus of documents) is almost constant as the idf part uses logarithmic function which is very aggressive for small collections. Thus, achieving accurate relevance for the text item through tf-idf technique corresponding to the small sub collection of documents may not be possible. Further, there is no notion of hierarchy in the standard tf-idf technique, other than the simple corpus document hierarchy.
In many cases relevant text items are supervised (e.g., manually selected set of tags). Many tag cloud implementations are based on these supervised tags. However in many practical scenarios, such as in emails, no supervised tags exist. Moreover, even when supervised tags exist, they are not always complete and may not cover all the topics in the document. Therefore, the existing methods of determining relevance score of the text item and generating tag cloud to present relevant text item may not facilitate finding significant, interesting and relevant text item in a document or a collection of documents.