The amount of user generated web contents, such as videos, photos, tweets, web pages, and user comments, have been growing exponentially. Various collaborative methods have been introduced to manage these ever-increasing online contents. For example, social tagging may be a collaborative method in which online users may provide descriptive words to mark the contents that are either uploaded or viewed by them. Another example may be hash-tags that are used by tweeter users to annotate their tweets.
Comparing to the traditional editor-controlled vocabulary, there is no limit to the keywords that are provided by online users for annotating the pages, photos or tweets. The census voting power of these users can provide rich facets for describing web contents, and these convenient ways of organizing contents have gained significant popularity in the Web 2.0 era. However, the user-provided annotations may not always be accurate. For example, users with less experience may introduce noise words that are misleading or wrong into the annotation vocabulary. Different users may choose synonyms to describe a common concept. And some generic words used for content-marking may be too obvious or have no substantial meaning. Thus, these heuristics may diminish the concise representation of the contents, and may affect subsequent browsing and searching of these contents on the web.