While “spam” is commonly thought of as bulk delivery of unwanted e-mail messages, the emergence of “social media” websites has provided spammers with new opportunities to identify potential recipients for their messages as well as new channels of delivery. The term “social media” refers to technology tools that allow member users to disseminate, discuss, share, or acquire information with or from other members. Social media websites may appeal to a group of individuals with common interests or, more commonly, maintain a broad membership base that allows users to express interests and exchange content generally or with like-minded members. Such content may include links to external websites. Social media is one specialized form of a “cloud-hosted service,” i.e., a consumer or business service offered to a diverse group of users who primarily access it through the public Internet. A sales contact database, a webmail program, and a document collaboration tool are each examples of “cloud-hosted services.”
The ease of identifying individuals and their interests, and the readily available website-provided channels for communicating with them, strongly attract spammers to social media. Utilizing a social media website's search tools, the spammer can target particular demographic segments or members with common interests, directing links or advertising to such users. While typically forbidden by the social media site's terms of use, this practice is nonetheless widespread and growing due to the mismatch between the spammer's economic incentives—spammers have no operating costs beyond the time involved in identifying users and posting messages, and much spam is automatically generated by so-called spambots—and the ability of a social media site's proprietors to effectively police the site. With the large (and growing) volume of messages posted and exchanged even on small social media websites, human oversight of message content is cost-prohibitive, and indeed, it may be difficult and/or time-consuming even for a trained professional to distinguish between legitimate and spam messages. Virtually every consumer-facing site that permits message posting is now vulnerable to spamming campaigns.
Current approaches to combating “social spam” tend to involve conventional filter analysis of comments, postings, messages, trackbacks and pingbacks as these are posted or delivered. Message-focused blocking techniques are coarse by nature, and may be indifferent to context: low-grade profanity in a sports-related discussion forum is obviously different from an advertisement for a pornographic web site, though the words may be similar. Monolithic rules can sometimes defeat rather than support the policy objectives of a social media site.
It is also possible to base filtering purely on the username rather than on the content of a message. This approach, too, has limitations. Spammers rarely maintain the same exact user information for long, so the “shelf life” of a user-based spam rule (i.e., a user blacklist) may be quite short. While it is possible to extend the rule beyond a specific user—for example, an email-spam filter may build a reputation around an IP address—the result is filter coarseness and the ensuing prospect of false negatives (so too much spam gets through) and/or false positives (which can alienate or even terminate legitimate users). Moreover, these filters tend to lack the ability to retroactively adjust the reputation assigned to a user based upon subsequent signals or behavior. An individual initially identified as a spammer, for example, may demonstrate himself, through later actions, to be a legitimate user, or vice versa.
These approaches operate, by definition, after a new user has signed up with a site. Operators of social media sites would also benefit from preventing suspicious or abusive users from signing up in the first place, and/or by storing the “suspiciousness” of the circumstances at the time the user signed up. While successful blocking of a single abusive user at sign-up may prevent the introduction of significant amounts of spam, the targeting must be precise (i.e., with minimal false positives) or the social media site will not only shun legitimate users but may damage its reputation in the wider community. Unfortunately, it is difficult to profile abusive users based on stable, identifiable characteristics, in part because of the “Darwinian evolution” of spamming activity: abusers closely monitor and reverse engineer filtering methodologies, and survive by adopting strategies to avoid these filters. Moreover, attempts to generate valid, statistically based classifier models of “good” and “bad” social media users typically require human review of significant sample sets—e.g., 10,000 representative examples of “bad” and 10,000 representative examples of “good.” Collecting such large sample sets in a scalable fashion and in a timely, economical manner represents a substantial challenge; indeed, by the time the sample is collected, it may be obsolete as the spammers change their patterns to avoid detection.