The massive growth of the web has been funded almost entirely via advertisements shown to users. Web advertisements have proven superior to traditional advertisements for several reasons, the most prominent being the ability to show personally relevant advertisements. To serve the most relevant advertisements, web advertisement agencies rely on mechanisms to uniquely identify and track user behavior over time. Known as trackers, these systems are able to uniquely identify a user via a variety of methods (e.g., persistent cookies, browser fingerprinting, etc.) and over time can build up enough information about a user to show targeted advertisements.
While advertisement agency's use of trackers has enabled the free-to-use model of the web, it also raises invasion of privacy concerns. These concerns have led to the creation of client side applications that block trackers and advertisements, for example AdBlock. While AdBlock has been quite successful in mitigating users' exposure to trackers, by definition it prevents the display of advertisements, and thus hurts web services' revenue streams.
There are four main entities in the advertisement ecosystem that are considered by the present invention: 1) the user, 2) the publisher, 3) the advertiser, and 4) the advertising network. The user visits web pages provided by the publisher who in turn obtains revenues through display advertisements paid by advertisers. The advertising network is the entity that coordinates the whole process.
The publishers, advertisers, and advertising networks have a common financial interest to increase the click through and conversion rates of users; i.e., the probability that a user actually clicks on an ad and makes a purchase. This is where Online Behavioral Advertising (OBA) comes into play, as it has been shown to significantly increase the click through rate.
For OBA to work, advertising networks need to track the activity of users across the web. This is achieved by placing tracking beacons in publishers' websites. The tracking beacons are usually small images embedded on the webpage code that trigger a request to the tracker's server. When a new user visits a website that is tracked by an advertisement network, his browser downloads the image, and the server in turn sets a cookie that is associated to this user. Subsequent requests to any website where the advertisement network has access will return the same cookie, therefore allowing the tracking of a user across the web.
Apart from that, personally identifiable information (PII) can be leaked through a variety of means, for example passing email addresses, real names, etc. as HTTP arguments after filling web forms [3]. When such PII leakage occurs in a page that hosts tracking cookies, then trackers can associate the real-world identity with online presence at any websites the cookie is observed. This is a very serious threat to one's privacy. Notice here that online behavioral targeting doesn't really need the association between the PII info and the cookie. All that is required is a constant identifier (e.g., a cookie) to be able to say that user X seen at foo.com is the same one now visiting bar.com. Such a constant identifier is effectively anonymous if it is not connected to PII.
This means that as long as users make sure that PII does not leak, then OBA can be carried out while the users remains anonymous. Eliminating all PII leakage however is quite difficult and in many cases impossible to achieve without breaking much of the web's usability. An alternative to blocking all PII is to just monitor it and when it happens clear all cookies to prevent matching the users PII with the past and future web sites that he will visit. For this to work, however, one has to protect against search and re-identification of individuals with leaked PII.
The tracker already has a sample of the behavioral pattern of this named user. Even if the user clears all his cookies, as soon as he accepts a new cookie from a site that the tracker operates on he risks re-identification and re-association to his real-world identity through a simple comparison of his sampled behavior as a named user and his newly accumulating behavior under the new identifier (cookie). In fact, re-identifying a user by comparing profiles associated with different cookies is a corner stone of the burgeoning cross device identification industry [4].
Thus, the overarching threat to identity privacy is the linking of online behavioral profiles to an individual user. Present invention addresses this threat model by ensuring that the profiles trackers build up are not uniquely identifiable, yet still contain useful behavioral information.
In addition, at the core of the problem of re-identifying users based on browsing behavior is the surprising “uniqueness” of people's browsing profiles using e.g., frequency histograms of visits to websites hosting tracking cookies.
Current existing technologies such as AdNostic [1] and Privad [2] aim to preserve privacy of the users with respect to OBA, but unlike them, the proposed device is transparent to trackers and does not require any change in the infrastructure of the advertisement ecosystem. The major problem with these technologies is that they require fundamental changes to the current advertising ecosystem. In short, at minimum they require changes to the way that advertising networks operate, and likely changes to users' clients as well.
In addition services like AdBlock, Disconnect.me, and Ghostery take an alternative approach whereby they attempt to make users aware of tracking that is taking place, and optionally block all advertisements/trackers they detect. The major problem with advertisement/tracker detection and blocking services like AdBlock are that they, by definition, prevent relevant advertisements from being shown. At large scale, this leads to the tragedy of the commons where users block all advertisements/trackers and prevent publishers from earning revenue from their content. As these services achieve widespread adoption, it will eventually lead to the publishers unable to earn revenue, and thus make continuing their content creation infeasible.
U.S. Pat. No. 7,562,387 relates to a method and apparatus for gathering click stream information from Web surfers while maintaining their privacy. In accordance with this invention, a Web site that collects click stream information provides an opportunity for visitors to choose not to have personal information gathered about them. If a person chooses not to have personal information gathered, the Web site continues to collect click stream information about the visitor's progress through the Web site as before by the use of cookies and/or URL rewriting, for instance, using Single Pixel technology, in which the client machines are made to send requests to a usage analyzer having cookies bearing the relevant click stream data. However, the cookies include an extra field called a privacy flag. If the visitor chooses not to have personal information gathered, the flag is set. Otherwise it is reset. The usage analyzer software checks the privacy flag in the cookie of each request it receives and, if the flag is set, replaces the data in any field of the corresponding log entry containing personal information with a default value. Accordingly, the Web site operator can continue to gather click stream information from visitors without collecting personal information.
Present invention on the contrary deliberately alters the “clickstream” to obfuscate what the tracker sees in the first place. Therefore, present invention does not require the attacker to honor an optional flag set by the user, can be done transparently, and does not require the user to make decisions about what data should and should not be associated with their profile.