Basic web-based content searching techniques are well known. Advancements in these systems include refinements on the produced search results, including tracking user activity relating to a search results page (“SERP”). It is important to track user activity for a variety of reasons, including monitoring the integrity of the search engine in detecting possible spam or automated robotic computing systems (e.g., “bots”). Monitoring activity can also generate beneficial results in helping to optimize search results on the SERP. But it can be difficult to detect normal user activity from erratic or abnormal activity, where the abnormal activity has a high probability of being related to some level of improper usage of the search technology. By way of example, if an advertiser pays for advertising on a per-click basis, it can be very important to determine when someone fraudulently clicks on a hyperlink, thereby improperly increasing the advertising costs for the advertiser.
One basic technique for detecting click fraud is to simply monitor user activity and visually attempt to detect if the click activity appears consistent with what would be considered a normal user session. For example, suppose a user is repeatedly clicking on an advertising link without ever clicking on any of the search result links, this may be indicative of click fraud where one party is attempting to increase the number of clicks for a particular link, and thereby potentially increasing advertising costs for the person or company that sponsoring such link. This click-fraud detection technique is not feasible on large-scale applications due to the sheer volume of clickstreams in most search engines.
Another problem occurs in determining those states constitute normal user behavior. Clickstream activity can be affected by a wide variety of factors, not the least of which are user demographics, Internet familiarity and interests of search technology users. This further complicates any user-based attempts to manually determine if clickstream data is normal.
The growth of web bots makes this detection even more important. Data mining can be a valuable resource for optimizing search engine technology and the activity of web bots obscures these data mining operations. More specifically, the search engine seeks to optimize operations based on user behavior, and these behaviors are thus obfuscated by the clickstream activity of web bots masquerading as users. The web bots also occupy significant bandwidth and computing resources, further reducing search engine optimization.
As such, there exists a need to determine normal user click behavior from abnormal activity. This determination can allow a search engine to detect fraudulent behavior, determine webbot activity and further optimize the search engine by allowing for the analysis of user click activity without abnormal click activity in the clickstream samples.