Collaborative tagging systems or folksonomies have emerged as a popular way of annotating and categorizing content using a set of tags that are created and managed collaboratively. A number of Web 2.0 applications like del.icio.us, Flickr, Connotea, Technorati and YouTube allow users to “tag” resources to facilitate search and retrieval both for the user and for other users. The appeal of folksonomies comes from the fact that they are community-generated and require less effort for creation and maintenance. This is in contrast to ontologies, which are often created by a small number of experts and are more consistent, but also relatively static and inflexible.
Two common search and navigation interfaces provided by many collaborative tagging systems are the search form and the tag cloud. A search form allows users to enter one or more tags in a text box, and the result of the search is typically a ranked list of resources that have been annotated by these tags. A tag cloud is a visual representation of a list of tags that uses visual cues like color or font size to depict the weights of tags. These search and navigation interfaces provide a means for users to discover resources without having to be aware beforehand of the set of tags used in describing the resources.
Conventional tagging systems exhibit poor scaling with increasing numbers of tags. Poor scaling results from limited screen real estate and the difficulty presented to users by browsing through long lists of tags. As a result, most tag clouds have a fixed number of tags, and only the most popular or highest weighted tags are displayed to the user. This limitation results in most resources being inaccessible from the tag cloud, since these resources may have been tagged by other tags which do not appear in the tag cloud. This problem is exacerbated by the fact that tagging of resources follows a long tail distribution, as described in Golder, S. and Huberman B., The Structure of Collaborative Tagging Systems, Technical Report, HP Labs, 2006. In long tail distribution, a few popular tags are used frequently, and the majority of the resources are annotated with low frequency tags. Hence, the tag cloud interface provides very low recall, where only a very small portion of the resources are accessible to the user.
A number of previous works have analyzed the tags from Web 2.0 sites, for example, del.icio.us, to determine various properties of these tags. Colder and Huberman discovered patterns in tagging dynamics in del.icio.us. They found that the majority of URLs reach their peak popularity, the highest frequency of tagging in a given time period, within 10 days of being saved on del.icio.us (67% in the data set of Colder and Huberman) though some sites are rediscovered by users (about 17% in their data set). This indicates stability of the tags for most sites, with some degree of burstiness. In addition, the proportion of frequencies of tags assigned to a given URL stabilizes over time. After the first 100 or so bookmarks, each tag's frequency is a nearly fixed proportion of the total frequency of all tags used.
In Halpin, H., Robu, V. and Shepherd, H, The Complex Dynamics of Collaborative Tagging, WWW 2007, a generative model of collaborative tagging is produced to explain the dynamics of the frequency distribution of tags for popular sites with a long history, i.e., with many users and many tags. According to the model and experiments, the frequency distribution of tags follows a power law. In Mika, P, Ontologies Are Us: A Unified Model of Social Networks and Semantics, ISWC 2005, a model of semantic-social networks is defined for extracting lightweight ontologies from del.icio.us. Besides calculating measures like the clustering coefficient and (local) betweeness centrality, Mika uses a symmetric distance measure for clustering the concept network.
In Hotho, A., Jaschke, R., Schmitz, C and Stumme, Information Retrieval in Folksonomies: Search and Ranking, ESWC. 2006, a new search algorithm for folksonomies is proposed, called FolkRank, which ranks tags using an adapted version of PageRank on a graphical representation of the tags in a folksonomy, while taking into account a user's preferences as extracted from the query. In Li, R., Bao, S., Fei, B., Su, Z. and Yu, Y, Towards Effective Browsing of Large Scale Social Annotations, WWW, 2007, an algorithm is described for allowing users to browse social annotation data in a hierarchical and semantic manner. In Zhou, M., Bao, S., Wu, X. and Yu, Y., An Unsupervised Model for Exploring Hierarchical Semantics from Social Annotations, ISWC 2007, an approach for learning hierarchical semantics from del.icio.us annotations is described.