Internet email (SMTP) is an inherently insecure medium, and it is well known as a convenient vehicle for advertising products and services in an unsolicited manner. A considerable amount of effort has been focused on developing methods for filtering such unsolicited emails, many of which are based on searching for and identifying patterns within various fields of an email. In an attempt to bypass these filters, new methods are constantly being developed, some of which rely on the recipient of an email performing additional actions beyond reading the incoming email.
In one such method, a Uniform Resource Locator (URL) is specified within the body of the email, which, at first sight appears to originate from a legitimate source, but which is in fact disguised; when the recipient clicks on the URL, a file is downloaded onto the recipient's machine and can cause unexpected behaviour. For example, such URLs may direct the recipient to visit a site which is similarly-named to a popular site but which is not operated by the organization owning the well known site, and which attempts to capture user identification and financial details. Alternatively a URL may be crafted to exploit vulnerability in the recipient's web browsing software or result in downloading of an executable process that runs autonomously on the recipient's machine without the knowledge of the recipient.
Several workers have developed methods directed towards identifying unsolicited emails on the basis of URLs contained therein. For example, International patent application having publication number WO2004/114614 describes comparing the attributes of URLs with attributes known to be characteristic of spam, and classifying emails accordingly, while U.S. Pat. No. 6,615,242 describes intercepting an email en route for a recipient, accessing the site corresponding to the URL, analyzing the data retrieved therefrom on the basis of various predetermined criteria, classifying the email accordingly, and transmitting or filtering the email on the basis of the classification. International patent application having publication number WO2004/097676 also describes accessing the site corresponding to the URL, but in this case, if the content is deemed to be acceptable, the URL is replaced with one associated with a trusted site. Thus the recipient of the email can only access the replaced URL, whereupon he is directed to a copy of the original content, which has been saved on the trusted site. In view of the sheer numbers of emails that contain such URLs, the approaches taken in U.S. Pat. No. 6,615,242 and WO2004/097676 incur a significant amount of processing effort; also in relation to U.S. Pat. No. 6,615,242, in view of the fact that the content of the site can change between the analysis being performed and the user accessing the site, there will be instances when the analysis is in any event a waste of processing effort.
It is an object of the present invention to provide an improved and more efficient method of detecting maliciously-crafted web links.