As the Internet continues to evolve, increasingly greater volume and variety of data is becoming available to computer users. Most of the available data (which includes both renderable text/graphics/audio data and executables) is benign or at worst not useful when accessed by a user via the Internet. However, some of the data may be objectionable and/or malicious. Preventing intentional or unintentional access to the malicious data is an important task for administrators and parents alike.
One way to prevent access to potentially malicious data involves the use of web filters and in particular URL filtering. Generally speaking, URL filtering works by examining the URL and blocking access to websites that have been deemed to be prohibited. For example, some URL filtering products are implemented as software in web filter gateways and function by checking the URL (Uniform Resource Locator) of web pages that are sent to the user's browser. If the URL associated with a webpage is found to be associated with a prohibited website, the gateway would block that web page from reaching the user's browser. For example, if website XYZ131.com is a prohibited website, any web page that contains the string “www.xyz131.com” in its URL would be blocked by the web filter from reaching the user's browser.
Clever users attempt to bypass or confuse web filters by using anonymizers. There exist commercial websites offering anonymizer service to users. Instead of accessing the prohibited website directly, the user instead access the anonymizer website and employs the anonymizer website to perform the access to the desired destination site (e.g., the aforementioned xyz131.com). With some anonymizer sites, when the webpage is returned to the user, the URL of the webpage reflects the URL of the anonymizer website and not of the desired destination site (e.g., xyz131.com). Other anonymizer sites may alternatively or additionally encrypt the data, rendering it impossible for the web filters to ascertain the origin or content of the data.
Current solutions to combat anonymizers include blocking any data that is returned from known anonymizer sites, i.e., treating all data from known anonymizer sites as harmful. This approach is shown in prior art FIG. 1 wherein user 102 attempts via a browser to access destination website 104 via a commercially available anonymizing web server 106. In this case, a gateway 108 performs URL filtering on the data returned from anonymizing web server 106 would block all traffic from anonymizing web server 106, thus preventing data from prohibited destination website 104 from reaching user 102. In the example of FIG. 1, gateway 108 is implemented separately from a firewall 110 but these two functions may be implemented together in the firewall in some cases if desired.
However, this approach fails to detect data that has been anonymized by web servers that are not known to the web filtering software. For example, if an employee runs an anonymizer software on his home computer 112 as a user-hosted anonymizer software and accesses the anonymizer software from work to visit prohibited website 104, the anonymized data appears to the company's firewall at his work place as if that data comes from the employee's home computer 112. As such, the anonymized data is not blocked and web filtering is frustrated.