The collaborative efforts of users participating in social media services such as Wikipedia, Flickr, and Delicious have led to an explosion in user-generated content. The content can occur in various forms, such as text, photos, video, audio, or multimedia content. A popular way of organizing the content is through tagging. In fact, a considerable amount of such content is labeled by user-defined tags. The tags provided by the user provide useful descriptors of the content, especially in the case of multimedia. Although informal tagging conventions have emerged, tagging does not restrict the user in any way when defining labels for describing content. The extensive freedom users enjoy allows for accurate descriptions and organization of content. The flexibility of such a tagging mechanism allows users to index and navigate the large amount of content that is being generated.
As a consequence, user-defined tags of content have likewise resulted in explosive growth. This imposes the problem of semantically categorizing and exploring a potentially infinite tag space. Any such endeavor is complicated by the practice of unrestricted labeling of content by users that has resulted in the emergence of an uncontrolled vocabulary that by far exceeds the semantics of a hierarchical ontology or taxonomy such as WordNet. The lack of a pre-defined schema makes the task of semantically exploring this immense and sparse tag space even more difficult.
Current solutions to word sense disambiguation involve using the context that terms occur in. In tag corpora, there is often minimal context making these methods inappropriate. See, for example, N. Ide and J. Vacronis, Word Sense Disambiguation: The State of the Art, Computational Linguistics, 24(1). Moreover, such an approach to mapping the user-defined tags upon an existing taxonomy does not scale to the vast vocabularies that exist within web-based services such as Flickr and Delicious.
What is needed is a way to classify user-defined tags of content for semantically exploring the corpora of user-defined tags. Such a system and method should be able to flexibly use a classification schema that may scale to the vast vocabularies that exist within web-based services.