Content filters exist for a variety of different network applications. They are often deployed to enforce a local policy that specifies what types of content local users are permitted to access. For example, content filters are commonly used to filter access to websites (i.e. www/http). They may be deployed in organizations such as schools, libraries, hotels, government offices, Internet cafes, corporations, and even in private homes to filter out content considered undesirable (e.g., political content, hate groups, pornography, etc.) by the local administration.
Existing web content filters are typically based on the site or domain name. Often the provider of a content filtering technology will provide a list of prohibited sites. Better implementations provide a list of categorized sites (e.g., cults, pornography, violence, hate groups, etc.) and allow the local administrator to decide which categories to ban. Some implementations allow or require the administrators to add and/or create a list of sites to block access to. These lists are typically specified in terms of domain names or uniform resource identifiers (URI) instead of internet protocol (IP) addresses for a variety of reasons, e.g., because IP addresses can be reassigned, change, be associated with both prohibited and permitted content, etc.
One of the difficulties in enforcing such a policy, however, is that the enforcement point does not see the domain name that a user is attempting to access. For example, a transmission control protocol (TCP) connection request to 63.215.198.31:80 is detected. A typical content filter notes that this is a web connection (port 80) and validates it against its blocking list. However, since its list is in domain name form, it must first determine the domain name associated with this IP address. A typical filter queries a domain name system (DNS) server to resolve the IP address into a domain name and receives a response back providing the name associated with the IP address by the DNS. However; DNS resolution is not guaranteed to be circular. That is, while www.prohibitedsite.com may resolve to IP address 63.215.198.31, IP address 63.215.198.31 may not resolve back to www.prohibitedsite.com. It may resolve back to something like 1-2-3-4-dsl.pacbell.com. If the “reverse resolution” (IP address to domain name) is not circular, the filter will receive a name that is different than what the user attempted to access. If this other name is not also in the filter block list, access will be allowed.
To overcome this potential lack of circular resolution, some existing content filter providers include IP addresses in addition to the domain name in their content filter block lists. However, since the IP address associated with a web site may change, lookups need to be performed periodically so that a current IP address is associated with each domain name in the block list. Since it is prohibitive for all clients (or even all gateways) in a network to constantly attempt to resolve the entire list to keep an up-to-date IP address list, such lookups are performed centrally, and frequent updates are provided to the clients. While this improves the situation by shrinking the possible window of difference, it does not eliminate it. Even if lookups and updates of the IP addresses of all the sites in the content filter block list can be performed in zero time, the latency and cost required to send updates to the clients makes constant updates impractical.
In some content filters, layer 7 (L7) protocol hints are utilized to determine the URI that a user is attempting to access. Specifically, an HTTP GET request header may include the URI, and this URI may be matched against the block list of the content filter to determine whether it is prohibited. The extracted URI, however, is useful to the content filter only if it is unencrypted. In addition, all protocols do not support the inclusion of the URI in L7 headers.
Thus, there is a need for more robust and efficient content filters.