1. Field of the Invention
The present invention relates to the transformation of data files for the purpose of nullifying malicious code that may be contained within any such file.
Malicious code is a generic term, and includes within its scope code which is assimilable within an entity capable of executing instructions (a ‘host’) which, upon such assimilation is capable of causing one of a number of effects, usually both unwanted and unintended by the user of such a host. For example, such code may affect the operation of: a host within which the code is assimilated (e.g. by deleting data or exposing it to external attack); a network of which the host forms a part (e.g. by generating traffic and thus slowing down the network); other hosts in such a network (e.g. by a denial of service attack on such hosts); or a user of the host (e.g. by causing disclosure of secret information in possession of the user).
Malicious code typically has two elements to it. The first element is known in the art as ‘exploit code’, that is to say code which is specifically adapted to exploit a certain characteristic (and usually unintended) response of a host to enable a attacker to gain privileged access to it (privileged access being access of a kind which would typically be reserved for a user or administrator of the host). The second malicious code element is known as a ‘payload’ and this is code which can be loaded onto a host once privileged access has been secured by the exploit code. Typically, execution of the payload code will cause the host to perform certain operations which are intended by the attacker; almost always that will mean the host operating in a way unintended by the user.
Hosts are exposed to malicious code in a number of ways. One way is simply by connecting their host to the Internet. Attackers automatically scan hosts which are connected to the Internet to check for vulnerabilities to attack by exploit code; in the event that such vulnerabilities are detected, a attacker may then decide to use an appropriate exploit code to gain privileged access to such a vulnerable host and load and execute payload code. Exploit code may also exploit user actions in certain cases. For example, a user may receive an email with an attached data file purporting to be legitimate but containing exploit code which executes on opening the attachment and causes the deployment of payload code on the host. In the instance of certain vulnerabilities and hosts the need for exploit code can be circumvented by user actions such as: opening an email attachment; choosing to download a data file from a website or during an instant messaging session with another user; or inadvertently loading malicious code onto a host by connecting a storage medium such as a CD or USB flash memory to their host and copying data from that medium onto their host.
2. Description of Related Art
The current practice for dealing with malicious code is to monitor the production of such code and produce remedial patches which remove the host's vulnerability to it. Monitoring can include relatively covert surveillance of the activity of certain known attackers, as well as tracking the incidence of publicly reported vulnerabilities. This practice is enhanced by the use of known exploit codes carrying benign payloads to scan hosts for vulnerabilities in order to establish which hosts of a network require patching, for example. Another remediating measure is to equip each host in a network with a client that throttles the output of packets from any host whose behaviour starts to indicate infection by malicious code, thereby limiting the rate of propagation of malicious code in the event of infection.
In addition, firewalls of networks scan incoming data files to identify malicious code and quarantine any data file which may be thought to be malicious.
Each of these measures, however, suffers from the same flaw, which is that they're remediating measures. A remedial patch is usually only developed in response to a detected vulnerability and, since vulnerabilities are most often found by attackers, at least some hosts will have been subjected to malicious code attack before anything can be done to protect against that form of attack. In short, current practices are based on a game of ‘catch up’ with the attackers and, by definition, the best that can ever be achieved is a very close second place.