Whilst the Internet represents a huge source of valuable information, much of the content that is available online is inappropriate or malicious, or indeed illegal. By its very nature, it is not always easy to track and eliminate such content. Such content may be malicious, for example malware capable of stealing individual's bank details, or pornographic. The best way for organisations and individuals to protect themselves against inappropriate content is to install onto their web servers and/or client devices security software, which filters outgoing content requests and/or downloaded data to remove unsuitable content.
A client device may comprise or represent any device used to access content from a website over one or more communication network(s) eg. wired and/or wireless communication networks. Examples of client devices that may be used in certain embodiments of the invention are wired or wireless devices such as mobile telephones, terminals, smart phones, portable computing devices such as lap tops, handheld devices, tablets, net-books, computers, personal digital assistants and other devices that can access website content and connect over a wired or wireless communication network.
A client gateway device may comprise or represent any device, network node, or server used for directing network traffic to client devices from one or more communication networks. Examples of client gateway devices that may be used in certain embodiments of the invention are devices, network nodes or elements, or servers such as residential gateways, gateway servers, gateway nodes, Wi-Fi routers/modems, DSL routers/modems, network address translator devices, protocol converters, cloud gateways, gateway devices connecting/between different communication networks, a network node configured to interface with another network that uses different protocols to direct/route network traffic to client devices, or any other device or proxy device that directs and/or routes network traffic over communication networks to the client device.
A client proxy device may comprise or represent any device, network node or server, that acts as an intermediary for client devices seeking resources over communication networks. Examples of client proxy devices that may be used in certain embodiments of the invention are devices or network elements such as proxy servers, forward proxy servers, reverse proxy servers, cloud storage or cloud servers, web proxies, web proxy servers or any other device or proxy device that acts as a cache for storing and distributing network resources over the communications network to client devices.
A web operator may comprise or represent any entity, person or team, or third party that is responsible for and/or manages the content of one or more websites. Examples of web operators that may be used in certain embodiments of the invention are webmasters, web owners, content authors, administration teams, or the person(s) responsible for creating and/or managing the web content or (web page content) of a website or of the web content of a company and/or knows exactly and/or authorise the publication of the content on the website.
Malicious content may be installed onto a website of a web operator by hackers using web page injection attacks. One type of attack is the “watering hole” attack where an attacker compromises a website or content provider that a type of user will eventually access (e.g. a special interest website) to serve malicious content to the end user. As such, a specific type of user is targeted, much like prey being stalked and attacked by predators at a watering hole. The compromise can simply be injecting the website with an exploit kit that loads an exploit to the user system when they access the website.
Alternatively, a script (e.g. a Java script RTM) may be added that includes from some other domain that distributes malicious content from seemingly random sites. But in all cases the pattern is the same, the website starts to offer unauthorised content to some or all users accessing the website.
Unauthorised content may also be inadvertently or illegitimately hosted on a website, or the website may be compromised for use as a drop point or malware command and control (C&C) point for covert communications, illegal or stolen material. For example, illegal porn, terrorist or a pirate communication board or simple file drop box may be hosted as a sub page under a legitimate website such that illegal porn, terrorist or pirated material is distributed. Other examples may include websites that were compromised and not thoroughly “cleaned” or the malware or compromised content remains unidentified such that the website is still being used as communication drop boxes or malware C&C point. As such, the website can offer content unbeknownst to the web operator and to legitimate users.
Another type of attack based on unauthorised local modifications to web page content that defrauds users into giving away their personal identity or bank information. For example, on a banking website, the website may have been injected with malware that compromises web traffic and includes additional input fields to get a user to handover their sensitive personal information, or phone model and number in order to serve malware to the users' phone or client device and bypass two factor authentication. Only users with infected client devices may get the extra question fields, while clean users may not.
Accidental website content may also be unwittingly served to users from advertising content providers hosted on the websites such that the website serves unauthorised content to users (e.g. pornographic material or contradictory information to that presented on the website). As an example, some advertising providers may not always be very stringent on the types of advertising they serve to websites, and a web operator (or website owner) can end up in a situation where the advertising content provider serves unwanted material on an otherwise clean website. This can be harmful to end users.
Internet security software may be installed on a user's client device that scans downloaded data for the presence of malware or identify unsafe content following a user or client device initiated request to download content from the Internet. This identification may occur either prior to a web page being downloaded or prior to it being displayed or otherwise processed on the client device. The approach relies on the maintenance at a central rating server of a website rating database. For each web page, as represented by a Uniform Resource Locator (URL), the database holds a rating indicating the nature of and threat posed by the web page. A rating indicates, for example, whether the content within a web page is suitable for children, is suitable for children but under adult supervision, or is completely inappropriate for children. The rating may also indicate whether the web page is known or likely to contain malware.
Whenever a web browser (or other application capable of accessing content at a website, for example an email client) sends a request to obtain content from a website, or perhaps sends a DNS look-up request to obtain an IP address for a URL, the request is intercepted by the Internet security software and the URL associated with the request is simultaneously sent to the rating server where the rating database is maintained. The rating server obtains the rating information for the web page (URL) in question, and returns this to the security application at the user's client device. The security application buffers any content received from the website associated with the request until such time as the rating has been received from the rating server. Depending upon the nature of the rating information, the security software may block (further) downloading of the content or processing (e.g. display) of already downloaded content and provide a warning to the user depending. Of course, if a rating indicates the website may contain malware such as a computer virus, downloading and processing is blocked.
The rating-based approach described above works well for websites having relatively static content, or at least content which does not change greatly in terms of its nature over time. However, the dynamic nature of many websites and the many types of attacks that a website may be exposed to represents a potential problem for security providers when attempting to identify and categorise web content. Due to limited resources, the providers of Internet security services are unable to access (e.g. using web speeding techniques) and re-rate websites on a regular basis. Changes in the content available at a particular website can remain undetected for several months or even longer. Thus, a web page of a website that is rated as benign may in fact harbour malicious, inappropriate or unauthorised content. This is not only dangerous from the point of view of the user, but will also reduces the user's trust in the security service, which may subsequently be turned off or uninstalled exposing the user to further malicious or unauthorised content.
Another approach is described in U.S. patent application Ser. No. 12/932,015 filed on 16 Feb. 2011 by the same applicant as the present invention, which reduces the risk of users and their client devices being exposed to inappropriate or malicious web content as a result of the above-mentioned periodic rating checks being carried out on such web content. U.S. patent application Ser. No. 12/932,015 describes performing a security check at a user's client device on web page content downloaded to the client device over the Internet. Rating information for the web page is retrieved from a web service over the Internet, the rating information includes one or more content ratings and a first signature generated from the content, using a specified algorithm, at substantially the same time as the or each content rating was determined. The downloaded web page content is then processed on the client device using the specified algorithm to generate a second signature. The first and second signatures are compared and the differences therebetween quantified. The client device then determines whether the quantified difference exceeds a threshold value. If not, then the received content rating(s) is(are) trusted. If yes, then the result is reported to said security service.
However, while such a system is an improvement over server generated web ratings, such a system is still fraught with risk to the end user as the security service provider still needs to generate numerous ratings for each web page. Delays updating web ratings will lead to inaccurate ratings being provided to users, resulting in inaccurate detection of malicious content. In addition, false alarms are generated every time a content author or web operator updates the website and the web rating has not yet taken into account the changes. As well, the client device is required to perform a large amount of processing for downloading, determining and comparing web rating signatures, and due to an apparent mismatch may report possibly authorised or legitimate content as malicious to the security service. This will affect users' experience of a web operator's website.