This invention relates to the field of word-clouds for visualizing data content. In particular, the invention relates to automatic word-cloud generation for sparsely tagged content.
The emergence of social media applications in recent years encourages people to be actively involved in content creation and classification, either through publishing personal blogs, providing direct feedback through comments, ratings, recommendations, sharing content such as photos and videos with the general public, and annotating content. Collaborative bookmarking systems such as Delicious (delicious.com, Delicious is a trade mark of Yahoo! Inc.), Dogear (Dogear is a trade mark of International Business Machines Corporation) for the enterprise, and many other content sharing sites (e.g., Flickr (www.flickr.com, Flickr is a trade mark of Yahoo! Inc.), Last.fm (www.last.fm, Last.fm is a trade mark of CBS Interactive), YouTube (www.youtube.com, YouTube is a trade mark of YouTube LLC)), encourage users to tag available content for their own usage as well as for the public. Other sites such as blogging services encourage their bloggers to tag their own content to improve the disclosure and findability of their posts.
A tag-cloud is a visual depiction of the terms of a content item, typically used to provide a visual summary or a semantic view of an item or a cluster of items that have something in common (e.g., the search results for a specific query). Tag-clouds have been popularized by social media sites such as Delicious, Flickr, and many others, to become a standard visualization tool for content representation on social media sites.
Tags in the cloud are normally listed alphabetically, and the importance of a tag is represented with font size or color. Thus, it is possible to easily find a tag alphabetically and by its importance. A tag in the cloud usually links to all items that are associated with it.
Tags annotated by users form a taxonomy of the tagged items, commonly termed folksonomy. The value of the folksonomy is derived from people who use their own vocabulary and add explicit meaning, which may derive from a personal inferred understanding of the item's value. Folksonomies have been found to be extremely useful for many information retrieval applications, including tag-cloud representation of social media items, query refinement, and search and browse enhancement.
Obviously, meaningful, high-quality tag-clouds can be generated in well-tagged domains where the resources are widely tagged. An item can be successfully represented by the tag-cloud that is based on its own tags, or on tags associated with similar items. On the contrary, existing tag-cloud generation techniques have difficulty in generating good representative tag-clouds for items in sparsely tagged domains.
When manual (user-provided) tags are not available, feature selection techniques can be used to extract meaningful terms from the item's content, or from other textual resources that are related to the item such as anchor-text or the item's meta-data. These extracted terms can be used as alternative tags to the manual tags. Extracted term based tag-clouds are referred to as word-clouds as they are formed of generated terms and not manual tags.
Extracted terms are usually inferior to manual tags since significant terms, from a statistical perspective, do not necessarily serve as good labels for the content from which they were extracted.