1. Statement of the Technical Field
The present invention relates to the field of Internet security and more particularly to content filtering.
2. Description of the Related Art
The global Internet represents the most substantial development in advertising since the advent of the television. Prior to the widespread adoption of the World Wide Web over the global Internet, media outlets had been restricted to print, radio and television advertising. In most cases, targeted individuals had no choice but to view the advertisements. As an example, in television broadcasting, advertisements can be included in a broadcast stream as additional content which can be indistinguishable from broadcast programming. Still, it will be of note that in the context of television broadcasting, advanced digital recording devices have proven successful in partially or entirely removing advertisements from recorded television programs despite the indistinguishable nature of the advertisement.
Unlike the television broadcasting medium, in the Internet medium advertisements are often delivered as embedded elements of markup defining an electronic document. Markup languages are well-known in the art and include not only the venerable hypertext markup language (HTML), but also extensible markup language (XML), wireless markup language (WML), and numerous variants of the standardized generalized markup language (SGML), to name a few. As will be recognized by the skilled artisan, such electronic documents include Web pages, among other forms of displayable content. A markup language document can define not only that content which can be viewed through a content browser such as a Web browser (herein referred to as a “browser”), but also supplemental content which can be presented in association with the content.
Content which is supplemental to the markup can include not only advertisements (typically in the form of embedded image references), but also embedded references to media, references to other markup language documents, markup language fragments, other types of documents, programs, scripts, and the like. In this regard, supplemental content is any content which can be loaded by or with the assistance of a browser based upon the markup in which a reference to the supplemental content has been embeded. Examples can include an image reference which can be loaded automatically, or a script or page which can be activated responsive to a user event such as when a user passes a mouse pointer over a specified portion of the markup as rendered in the browser.
In operation, when a browser retrieves markup, the browser can parse the markup to render the document. In particular, for each reference to content referenced within the markup, the browser can retrieve the referenced content and can subsequently render the content within or in association with the markup as specified by the reference. Importantly, it is well-known to interpose a reverse proxy server (surrogate) between sources of content and markup and the browser for both security and efficiency reasons. In particular, the reverse proxy server can serve the supplemental content referenced within markup. Thus, as it will be apparent to the skilled artisan, the surrogate is positioned to know the true location of supplemental content requested by individual browsers in the course of rendering markup.
As is the case in television broadcasting, many in the targeted audience would prefer not to be inundated with advertisements while browsing. Moreover many would prefer to restrict the rendering of supplemental content. The market has responded to this known preference through the development of content blocking technology. Filtering represents one embodiment of such content blocking technology. In particular, filtering involves the association of portions of a uniform resource identifier (URI) with known sources of undesirable content. Consequently, each time the browser (or a forward proxy server acting on behalf of the browser, as the case may be) identifies a known source string within a URI referring to content, the request for the content can be ignored and the content can be omitted when rendering the markup. As a result, the blocked content will never be fetched, cached, displayed, or seen by the user. In fact, the blocked content simply fails to reach its target audience.
Initially, content blocking technology was adopted only by technically sophisticated early adopters, representing a relatively small percentage of the on-line population. Most experts recognize that the content blocking would be more widely adopted if its functionality were packaged with other Web oriented software. Presently the integration of content blocking technology within other Web oriented products has begun, as will be evidenced by the inclusion of some sort of content blocking technology in software from several leading security and virus detection manufacturers.
Proponents of content blocking claim that browsing speed can be improved by not downloading slow-loading advertisement banners and buttons. In this regard, slow-loading supplemental content can inhibit browsing of an entire page until the supplemental content either loads or times out. Opponents of content blocking, on the other hand, observe that content blocking amounts to theft, as ad-free surfers use valuable resources of the principal content providers without indirectly “paying” for the principal content by viewing supplemental content such as advertisements. A few principal content publishers have gone so far as to reject content blocking visitors.
If widely embraced, content blocking might result in some important unintended consequences for users of the World Wide Web. For instance, blocking anything that can be easily identified as an advertisement actually may encourage more aggressive revenue-generation models. If legitimate advertising is eliminated, Web sites might feel pressured to sacrifice editorial integrity by using sneaky paid “advertorials” (as has already become the case in the real estate and automobile markets) in which paid advertising masquerades as unbiased editorial opinion. Finally, it is conceivable that if primary content providers no longer receive revenue from supplemental content, they might institute new business models involving subscriptions or per-view fees for serving primary content to users via markup,
Presently, technologies exist which unintentionally can be effective in circumventing content blocking. Specifically, uniform resource locator (URL) rewriting has been used to mask the true path to the location of content. Yet, the indiscriminate use of URL rewriting can defeat the effectiveness of caching, the principal mechanism for enhancing the responsiveness of Web pages. Where URL rewriting indiscriminately varies the path to content, the advantages of caching can be lost and network bandwidth can be unnecessarily consumed. Accordingly, URL rewriting in and of itself cannot serve as a solution to the problem of supplemental content being blocked.