1. Technical Field
The present teaching relates to methods, systems, and programming for website tagging. In particular, the present teaching relates to methods, systems, and programming for blocking malicious third party site tagging.
2. Discussion of Technical Background
Online publishers instrument numerous third party site tags for marketing and analytics which includes targeting, ad verification, ad serving, tracking return on investment (ROI), etc., and for augmenting consumer experience via online surveys and recommendations. In general, third party site tags are instrumented by incorporating JavaScript or HTML code to the publisher web page. While tags are helpful in many ways such as personalizing contents, increasing ROI, better targeting, etc., however, incorporation of any third party code that are not administrated by the publisher can lead to security vulnerabilities such as Document Object Module (DOM) exposure, which may leads to compromise of user credentials, fake clicks or other user interactions, view of user keystrokes, malicious access and tampering of the publisher page content; objectionable content being loaded on the web page, which may lead to malware, malformed, or slow content browsing and impact the user experience on the page; violation of user privacy, in which by allowing the third parties to execute JavaScript on the publisher web page, the third parties can collect user personally identifiable information (PII) data associated with the publisher, and thus impacting user privacy; and data leakage, in which by referencing to an arbitrary third party code incorporated to a publisher web page, a fourth party or another entity that has no valid contract with the publisher may be externally invoked. Therefore, the publisher may be unaware of data leakage to the fourth party or another entity including user data and business data associated with the publisher web page.
As a standard practice, the publisher regularly tests the third party tags to ensure that data being collected is limited to business purpose and no fourth party piggybacking; and enforces terms and conditions declaring what a third party tag can do on the publisher web page. However, given that the third party tags are very rarely hosted on the publisher server due to maintenance and operational costs, the JavaScript or HTML code associated with these tags are not administrated by the publisher. Therefore, changes that are not approved by the publisher can be easily introduced to the tags. To prevent the unauthorized changes to the third party tags, the publisher implements a monitoring scheme that triggers an alert once a change in the third party tags is detected. However, actions to protect the data are usually taken after the monitoring scheme sends the alert, and important business data and/or sensitive user data associated with the publisher web page may have exposed to the fourth parties or other entities via the tampered third party tags.
FIG. 1 shows an exemplary tag loading process in the prior art. A web page 102 hosts a plurality of tags that are associated with tag sources. When a request to load Tag 1 is sent from the web page 102, as the source of Tag 1 104 is an associated third party with the web page, the Tag 1 source 104 returns the content to be loaded in response to the request to load Tag 1. However, JavaScript of Tag 1 is hosted and administrated on the third party domain, and the web page 102 has no control on the JavaScript. When the JavaScript of Tag 1 includes domains of other succeeding tags that are not associated with the web page, the web page administrator has no effective way to prevent these succeeding tags from being loaded on the web page. For example, Tag 1 also refers to succeeding Tag 1-1, . . . , Tag 1-n. When Tag 1 source 104 is called, Tag 1-1 source 106, . . . , Tag 1-n source 108 are also called to load the contents. The contents of the succeeding tags are returned and boarded on the web page without scrutinizing whether the Tag 1-1 source 106, . . . , Tag 1-n source 108 are harmless to the web page. In an example that the web page 102 is Yahoo! mail and the tag source 104, is trackers instrumented on Yahoo! mail for analyzing audience behavior, Yahoo! mail has no effective way to ensure that when Tag 1 is loaded on the Yahoo! mail, it does not piggyback any other tags or sources, for example, Google analytics for analytic enhancement, which could lead to Yahoo! losing their critical business and user sensitive data to one of its main competitors.
Accordingly, it is crucial for the publisher to restrict instrument tags to just the third parties whose JavaScript or HTML code has been vetted and approved by the publisher. The third parties may only collect data from the publisher web page for certain business purpose according to the agreed terms and conditions with the publisher as to how the collected data can be used and stored, and ensure that the data is not to be shared with unauthorized entities and there is a process to delete these data. Current approach to identify the violating third party site tags is through regular auditing. However, there has not been an effective way of whitelisting the resources from which the content can be loaded on the publisher web page.
Therefore, there is a need to provide an improved solution for preventing malicious third party site tagging to solve the above-mentioned problems.