A “bot” is a software application that runs automated tasks over the Internet. Bots pretend to be human beings and deceive normal users in following them. Bots are used for a plethora of reasons including, for example, to harvest email addresses from contact or guestbook pages, to suck bandwidth, to grab the content of websites such as to re-use it, to buy up concert seats—particularly by ticket brokers who resell the tickets—, to farm for resources, to increase traffic counts on analytics reporting, for example, to extract money from advertisers, or to deceive normal users in following them.
Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone. The largest use of bots is in web spidering, in which an automated script fetches, analyses and files information from web servers at many times the speed of a human. In addition, bots may also be implemented where a response speed faster than that of humans is required (e.g., gaming bots and auction-site robots) or less commonly in situations where the emulation of human activity is required, for example chat bots.
As a specific example, bots may be used to visit real websites run by real companies with real human visitors such that the bots inflate the monetized audience causing advertisers to lose revenue. Bots may be used to commit ad fraud by hijacking browsers to masquerade as real users and blend in with the human traffic to generate more money. Making them more difficult to detect, certain sophisticated bots may be used to move the cursor such as over ads, as well as used to generate cookies to appear more appealing to certain advertisers.
Bots are also fairly common in social media. However, current methods social media sites use to detect and prevent bots are clearly insufficient. For example, it currently takes an average of three days at best for social media sites to detect a bot measured from its day of creation. The longer a bot is active the more abuses it may initiate.
Real-time bot detection is needed because the longer bots are active, the more abuses they initiate. Bot detection methods that work within a day of registration are more desirable than methods that detect after a long time. In addition, bot detection algorithms working on archived data or at one instance of time can miss many bots. For example, bots can delete their tweets after a day or few hours once the tweets are propagated.
Surprisingly there is no work on finding these correlated bots in real-time. Most existing works including Twitter Rules focus on per-account features to classify between fraud and innocent accounts. Reactive and clever account merchants just started mimicking humans to avoid being detected and suspended by these methods. However, humans are slow in activity, while merchants need to keep increasing followers. Therefore, they focus on throughput by creating many accounts that behave like human in the same way. In other words, instead of having one bot account which tweets 10000 times per day, they prefer to have 1000 bot accounts each tweets 10 times per day.
Therefore, real-time bot detection is necessary in order to reduce, if not eliminate, abuses created or initiated by bots. The invention satisfies this demand by tracking correlated activity to detect bots real-time.