Normally, when a user wishes to obtain digital content via the Internet using an Internet-enabled computing device (such as a desktop computer, a laptop computer, a “tablet” device, a handheld device such as a “smart” mobile phone or “Personal Digital Assistant (PDA)” device, or another type of Internet-enabled device such as an Internet-enabled television or games-console), the user requests the content by typing (or otherwise entering, by selecting a link, for example) into the device's browser a Uniform Resource Locator (URL) (e.g. www.website-name.com/requested-content.html) for the content. The canonical name (i.e. www.website-name.com) part must be translated into an IP (Internet Protocol) address from which the content may be requested and subsequently received/downloaded. The translating of canonical names into IP addresses is generally done via the Domain Name Service (DNS), and involves the user's device submitting a DNS “look-up” request (including the canonical name of the website from which the user wishes to obtain content) to a DNS server, which is usually controlled by the user's Internet Service Provider (ISP). The DNS server responds by providing the IP address of a content server to which a request for the desired content can be routed. The user device then submits its content request to that content server.
ISPs are able to provide “parental control” (or other such administrator-controlled) services to their customers in order to assist parents (or other such “administrators” of networks) in preventing or hindering access to inappropriate content by children (or other users of the network under the control of the administrator in question). It will be understood that the term “parental control”, while particularly applicable in relation to scenarios involving family members using a home network, is applicable more generally, and that the techniques involved are generally applicable in relation to situations involving schools' networks, corporate networks, Internet cafes, and other networks where administrators wish to be able to exercise some level of control over the content they and/or other users may access. It will also be understood that the techniques involved in “parental control” may be applicable in relation to local networks, Virtual Private Networks (VPNs) and other such networks, which may be wired, wireless, or wired and wireless.
Parental control services may be provided using (amongst other techniques) a technique known as “DNS poisoning”. This involves the DNS server responding to DNS requests in respect of websites the content from which is deemed inappropriate (for any of a variety of reasons) by NOT providing the IP address of a content server that can provide the desired content. It may ignore the DNS request, may return an alternative IP address, e.g. to a site from which a warning or explanatory message may then be received, or may otherwise “deny” the DNS request, in each case hindering the device from obtaining the desired content.
Looking at this further, with DNS poisoning, when a user selects or types into a device's browser a URL (e.g. www.bad-website-name.com/bad-content.html), the device submits to its DNS server (as usual) a DNS “look-up” request including the canonical name of the website from which the user wishes to obtain content (i.e. www.bad-website-name.com) in order to have the canonical name translated into an applicable IP address. The DNS server compares the canonical name to a blacklist of websites. If the canonical name is on the blacklist then the DNS server will not return an applicable IP address for the blocked canonical name. As explained above, it may ignore the DNS request entirely, may return an alternative IP address (e.g. to a site with a “warning” or “explanatory” message), or may otherwise “deny” the DNS request. In the first instance (i.e. ignoring the DNS request) this will result in the user's request timing out and the user's browser showing an error message of its own. In the second instance (i.e. alternative IP address being returned to the user's device), this will generally result in the device requesting content from that alternative IP address, which will generally ignore the precise URL and instead return some content other than that desired by the user, such as a page saying that the requested content is blocked.
DNS poisoning is relatively easy to circumvent, however. A user can use a number of approaches to obtain an IP address other than using an ISP's DNS server, including running a DNS server within their home network, using a third-party DNS server (e.g. “Open DNS”) or using an alternative “look-up” site and entering a website IP address directly into a browser's address bar, etc.
Looking at this in more detail, there are various ways in which an individual user may arrange for a particular device to avoid using DNS servers (chosen by a parent or other administrator, or by an ISP on their behalf) that implement DNS poisoning-based parental control, and thereby undermine such parental control. Users may, for example, undermine such parental control by:
(1) Specifying a different DNS server;
(2) Using a “hosts” file; or
(3) Typing an IP address in the address bar of a browser, rather than a URL or a canonical name.
(4) Clicking a hyper-link in another website such as a chat forum, an email or other messaging service where the link uses an IP address and not a canonical name
In relation to (1), the specified DNS server may be inside or outside the user's local network. It would be possible for a parent or administrator or for an ISP to ensure that all DNS packets destined from the line used by the user's device to any such “other” DNS server outside the local network to be blocked, but such blocking could be circumvented by IP tunneling, for example. Further, such blocking could be circumvented simply by running an “other” DNS server within the user's local network, again successfully circumventing any DNS poisoning.
In relation to (2) and (3), a user could find the mapping from canonical name to IP address via some means other than DNS, such as email contact, chat messages, or bulletin boards, and thus remove their reliance on DNS servers entirely.
It will be noted that for the vast majority of domestic customers, the IP address of the DNS server to use is supplied by their ISP. In this way, only customers who have opted in to content filtering (or not opted out of content filtering, as applicable, depending on what the default situation is) will cause their users' content requests to be subjected to any DNS poisoning or other DNS-based content filtering.
As an alternative to DNS poisoning, it is also known to prevent access to inappropriate content by performing blocking or filtering at the “content request” stage. For this, content requests, which generally contain an indication of an IP address of an applicable content server and the URL of the desired content (as well as an indication of the requester's IP address) can be ignored, blocked or otherwise denied by a filter in dependence on the URL of the desired content or on the IP address of the content server. Both types of filtering have disadvantages, however. For the former (i.e. blocking/filtering based on URL), content must generally be categorised as appropriate (“good”) or inappropriate (“bad”) on a URL-by-URL basis, which is often unworkable. The latter (i.e. filtering based on IP address) is less intensive and can be effective, but fails to deal with the fact that one or more content servers at a particular IP address may well host “good” and “bad” content, and that the same IP address may be used for more than one canonical name (as is often the case in respect of Content Delivery Networks (CDNs), for example), so blocking based on IP address may have the unintended consequence of blocking traffic in respect of websites that should not be subject to any such control. It will also be noted that mappings from canonical name to IP address can change.
Blocking based on IP address can be performed by dropping all content to a particular IP address or, in the case of a TCP (Transmission Control Protocol) connection, by dropping just the “SYN” (i.e. synchronisation) packet. Dropping the TCP SYN packet prevents a TCP connection being set up.
Another way of preventing access to certain content involves performing filtering of content being returned to users following successfully-submitted content requests. Typical ways of doing this involve looking for keywords or analysing images or videos. The former (keyword-based analysis) may be relatively simple but can result in many false positives. The latter (image analysis) can be computationally-intensive, leading to increased delays or expense, and can also result in many false positives.
Referring to prior art literature, an article entitled “BT Puts Block on Child Porn Sites” by Martin Bright, published in The Observer newspaper on 6 Jun. 2004 (and available online at http://www.theguardian.com/technololgy/2004/jun/06/childrensservices.childprotection) discusses a system known as “CleanFeed” for blocking images of child pornography. This is further discussed online at http://en.wikipedia.org/wiki/Cleanfeed (content blocking system). With this system, given the illegality of such images in many countries, all customers' lines may be checked. The system uses a two-stage process, based on a confidential blacklist provided by the Internet Watch Foundation (IWF) which contains URLs of pages (i.e. not whole sites) to be blocked (and whose production therefore requires significant human intervention), and a less confidential list of IP addresses of sites potentially containing blocked pages, which is made available to ISPs. The routers of an ISP using the system check traffic destination against the list of IP addresses. If there is no match, the traffic is directed to the content host. If the IP address is on the list, the traffic is routed to proxy servers that check the specific page against the confidential blacklist.
Further discussion of the “Clean Feed” system can be found in an article by Richard Clayton entitled “Failures in a Hybrid Content Blocking System” (Proceedings of the 5th International conference on Privacy Enhancing Technologies, 2005, PET'05, pages 78-92) (available online at http://www.cl.cam.ac.uk/˜mc1/cleanfeed.pdf).
Referring to prior patent documents, United States application US2012/084423 discloses a method and system for domain based dynamic traffic steering. A domain name is compared to a blacklist and/or whitelist of domain names and if it is on either, the corresponding IP blacklist or whitelist is updated with the IP address for the domain.
United States application US2011/0078309 describes a data processing apparatus comprising a traffic monitor which can create and manage lists of IP addresses to track, which IP addresses were triggered by DNS look-ups for domain names. As a result, it is claimed that actions can be taken for domains or portions of domains instead of taking action only based on IP addresses, so that not all traffic from the corresponding IP address is affected or acted upon in the same way as with a typical firewall that acts based upon IP addresses alone. It is noted that multiple names may resolve to the same IP address, and that a host in a “bad” domain could map to an otherwise normally “good” IP address.
U.S. Pat. No. 7,792,994 relates to correlating network DNS data to filter content. A DNS request made by an internal host in a network to obtain an IP address and the corresponding response from a DNS server are intercepted and cached. By caching the DNS request and the corresponding response, the IP address the host thinks is associated with the domain name, URI, or other identifier for which the corresponding IP address was requested from the DNS server is known. When the host subsequently uses the IP address to open a TCP (or TCP/IP) connection, the IP address is mapped to the corresponding domain name in the cache and it is determined whether the domain name is in a block list.
U.S. Pat. No. 7,849,502 relates to apparatus for monitoring network traffic, in particular for preventing spyware and other threats from harming computer networks.
International application WO2012/162099 relates to use of DNS communications to filter domain names. A domain name is extracted from a received DNS request. The received DNS request is blocked in response to determining based on a policy that access to the domain name of the DNS request is not permitted. In some cases, such a DNS request is responded to with a spoofed DNS response.
United States application US2007/0204040 relates to techniques for filtering domain names through the DNS. It uses a domain name processing application that generates a filtering domain name based on a filter service to a domain name. A DNS resolver sends the filtering domain name to a filter service through the DNS. The filter service determines if the filtering domain name is approved and returns a DNS record indicating whether the domain name is approved. The DNS resolver receives the DNS record from the filter service and sends a response to the user.