The invention relates generally to detection and prevention of spam.
The more content in the form of documents of all sorts, videos, music and the like is available on the Internet, the more there is a need to structure the content according to the requirements of individual users or user groups. An emerging technology allowing users to structure or categorize content autonomously is known as tagging. Single users may organize their views into the cloud of resources on the Internet via private tags, whereas communities of users may use community tags. These tags may be visible and usable to all users of a community. A tag may be a visual depiction of a special information resource on the Internet. Typically, more often applied tags may be displayed more prominently, e.g., by increased font size, if compared to less often applied ones. This way the view to the resources may be personalized for an easy navigation through large, complex information spaces. A tag may be a more or less relevant keyword or term associated with or assigned to a piece of information, e.g., text document, a spreadsheet, a picture, a geographic map, a blog entry, a video clip, etc., thus, describing the item and enabling keyword-based classification and search of information. The pool of tags available in a system is usually aggregated in what is referred to as a tag cloud.
As tagging systems are gaining in popularity, they become more susceptible to tag spam, i.e., misleading tags that are generated in order to push some resources to increase their visibility. In other cases tag spam may simply confuse users.
Several approaches have been followed in order to improve spam detection in cloud tags. Document U.S. Pat. No. 7,685,198B2 discloses a defined set of general criteria to improve the efficiency of a tagging system. The set of criteria has been applied to collaboration tag suggestions to a user. The collaboration tag suggestions are based on a goodness measure for tags derived from collective user authorities to combat spam. The goodness measure is iteratively adjusted by reward-penalty algorithms during tag selection. The collaborative tag suggestions can also incorporate other resources, e.g., content-based auto-generated tags.
Document “Combating Spam in Tagging Systems: An Evaluation”, ACM Journal Name, Vol. 1, No. 1 2001, pages 1-35 discloses a series of different measures to detect tagging spam. The behavior of existing approaches and malicious attacks, and the impact of a moderator and a ranking model, has been studied.
Thus, there may be a need for an improved architecture, method and system design to properly, reliably and automatically detect spam in tagging systems and, in particular, avoid future misuse of tagging systems in order to make tagging more reliable and trustworthy for users.