The present disclosure relates to technology for determining content, for example, spam in an online community, by identifying occurrences of fast-rising features and/or signals in multiple consecutive windows of time.
Web sites with user-generated content, for example, social networks are a frequent target for hackers and spammers. Hackers frequently attempt to acquire information such as usernames, passwords, and credit card details (and sometimes, indirectly, money) by masquerading as a trustworthy source in electronic communications. These acts are referred to as phishing. Phished accounts and malware downloads are used to create a botnet (when computers are penetrated by software from a malware distribution or by malicious software), which is subsequently used to spread spam. Spam is of significant commercial value to hackers. Compromised accounts are very helpful to hackers as people tend to trust messages from friends. Many hackers are continuously posting spam messages, comments, etc.
By the very nature of social networks, some of these attacks tend to spread virally, making the attacks very potent. It takes a great deal of resources to determine if a uniform resource locator (“URL”) is harmful; a typical decision required for fetching not only the URL, but also following redirects, fetching, and rendering of each contained resource (URL, image, etc.).
A single user-generated content web site can receive millions of unique URLs each day. As such, it is vital to weed out the “noise” from these URLs and prioritize the list of content for examination. Similarly there are many spam posts and comments that are spread more and more.