The expansion of Internet and World Wide Web (commonly known as web) has given users the enhanced ability to read, listen, and watch different forms of web content. However, due to significant increase in the amount of web content, retrieving relevant web content has become a challenge. Therefore, the concept of tagging has been introduced to classify and search relevant web content. Tagging refers to a process of assigning a tag to the web content. The tag is a keyword or term assigned to the web content by a user and/or a web content owner. Usage of tags allows users to classify web contents they use, and later systems use the tags to search relevant web content, which interests other users. In other words, tags are widely reorganized as keywords used to describe the web content.
Currently, there are different methods used to define tags. One of the methods includes defining the tag by the user when the web content is consumed, using the user's own vocabulary. In another method, the tag is defined by the web content owner by drawing words from a controlled vocabulary. In yet another method, the tag defined by the web content owner gives an option to the user to re-tag the tag. However, the methods described above have one or more of the following drawbacks during analysis of the relevant web content. First, as the tag defined by the user lacks control over terminology, there are chances of meta-noise. Second, even though the tag defined by the web content owner includes well-understood terms, the tag may become obsolete over time. Also, the tag defined by the web content owner and later re-tagged by the user may combine the strengths from the above mentioned methods, while also inheriting the drawbacks. Therefore, search results, as retrieved by the system through analyzing the tags and as defined by the above mentioned methods, may not be precise.
Further, the web content often evolves independently of the tag that is associated with it. Therefore, information describing how the tagged content changes is not known. Also, the current system falls short by failing to identify popular tags from which the web content is consumed the most. Therefore, obliterating ineffective tags, unused tags or meta-noise associated with the web content is not achieved.
In general, though tagging is considered as the prevailing tool for personalizing, classifying and searching heterogeneous web content, systems often find it hard to analyze tags and the associated web content for various analytic purposes. Therefore, analyzing the tagged content, i.e., how the web content evolved and has been consumed along with the change in associated tags would be desirable to the systems to classify and identify the relevant web content.