This disclosure relates to detecting anomalous behavior in a computer system or computer network.
Search engines derive revenue based in large part on computer user actions. For example, Pay-Per-Click (PPC) advertising, also known as Cost Per Click (CPC), is one example of a business model by which some search engines derive most or all of their revenue. Advertisers pay a search engine to place links to a website onto web pages that the search engine controls. A typical example is the border of “sponsored ads” that are returned along with the “organic” or “natural” results from a normal search. The search engine is paid each time a consumer clicks upon a displayed link. Advertisers compete in an auction process for having their links placed in the premium central location of the displayed ads.
Beyond the revenue generated from PPC Search, an equally sizable revenue stream arises from the placement of sponsored ads on other websites that have joined a search engine's network of affiliates. In this arrangement, a cooperating affiliate website allots the search engine space on their web page for placement by the search engine of advertisements that are deemed “relevant” to visitors of the website. The affiliate shares in the revenue generated by the click-through traffic.
As usually occurs, concomitant with the creation of a new business has been the onset of a new fraud: Click Fraud. Click Fraud is generically defined as the creation of click traffic solely for the purpose of driving up the advertiser's costs. There is no intent of “shopping”. There are two major varieties of Click Fraud:
1) Malicious competition, wherein the sole purpose is to damage an advertiser's marketing budget. As advertisers typically specify daily spending limits with the search engines, there exists the potential for an entity to eliminate competition by flooding their competition with false traffic. Such motivation is most effective against competitors with limited spending budgets. A risk for retaliation requires that the source of the traffic be concealed, i.e. lack any tags to the source. One version of this fraud variety is to generate false traffic for the purpose of eroding the brand name of a particular search engine.
2) Fraud for profit. Here the goal is the direct collection of advertiser dollars. The criminals exploit the cooperative network arrangements to join a search engine's merchant network after having created a fictitious website. Revenue can then be manufactured by simply clicking on the links thus inserted by the search engine.
The true scale of the Click Fraud problem is unknown: industry estimates range from 0.5% up to 20% and beyond. Even a modest estimate of 5% translates into click fraud being a problem comparable to all of U.S. credit-card fraud. The ambiguity in determining the click fraud rate resides in the fact that unlike credit-card fraud, where there is a consumer examining a monthly bill, in the case of PPC traffic there is usually no objective reviewer that can determine what is valid and what is fraud.
The absence of tags implies that click fraud detection translates into an anomaly detection problem. The goal is to identify in the stream of clicks arriving at an advertiser's site outliers that are anomalous in a manner indicative of the major fraud types described above. For merchants with a large ecommerce channel, one can with almost complete confidence label visits that consummate in a purchase as unambiguously non-fraud. One caveat relates to charge-backs: some fraud can be directly attacked using other monitoring systems, see e.g., U.S. Patent Appl. 20020099649 (Feb. 12, 2001) “Identification and Management of Fraudulent Credit/Debit Card Purchases at Merchant Ecommerce Sites”. At the moment there is little evidence that perpetrators of click fraud are masking their attacks by generating fraudulent purchases. It is unlikely they ever will, as there are much more direct and lucrative outlets to extract financial gain from a compromised payment card than by conducting click fraud.