The present invention relates to computer security, and more specifically, to detecting web exploit kit instance infections using a tree-based structural similarity search.
Today, computers are routinely compromised while performing seemingly innocuous activities like reading articles on a trusted website. Compromises are perpetuated through client interactions with the complex advertising networks that monetize these sites, all too often by fraudsters and other miscreants. A well-known example is the widely publicized case involving advertising networks from Google and Microsoft that were tricked into displaying malicious content by miscreants posing as legitimate advertisers.
Many of these HTTP-based attacks are launched through the use of exploit kits, which are web-based services designed to exploit vulnerabilities in web browsers by downloading malicious files with pre-written exploit code in Java, Silverlight, or Flash that then targets users running insecure or outdated software applications on their computers.
For purpose of the following explanation of the present invention, the term “exploit kit”, sometimes called an “exploit pack”, refers to a type of malicious toolkit used, for example, to exploit security holes found in software applications for the purpose of spreading malware. Common exploit targets have been vulnerabilities in Adobe Reader®, Java Runtime Environment®, and Adobe Flash Player®. An exploit kit can be used even by attackers who are not IT or security experts. The attacker does not even need to know how to create exploits to benefit from infecting systems. Further, an exploit pack typically provides a user-friendly web interface that helps the attacker track the infection campaign. Some exploit kits offer capabilities for remotely controlling the exploited system, allowing the attacker to create an Internet crimeware platform for further malicious activities. A number of exploit kits, such as Fiesta, Blackhole, Goon, Angler, Nuclear, and Magnitude, are well known in computer security circles and represent an entire software-as-a-service sub-industry for cybercriminals.
Currently there are three main techniques for detecting malicious websites related to exploit kits. The most popular approaches are signature- or blacklist-based approaches that look for specific domain names or patterns in URLs or payloads. Unfortunately, blacklists cannot identify domain names that are not in the list, while signature-based techniques are plagued with high false positives and false negatives. The second popular technique involves crawling the web for malicious content using known malicious websites as a seed. The websites crawled are verified using dynamic analysis and flagged sites are added to blacklists. These approaches can be considered as complementary to the new techniques described in the present invention but require significant resources to comb the Internet at scale. Other approaches focus on statistical machine learning techniques to classify web traffic as malicious or benign, but these techniques require a large training set and have high false positive rates.
Another characteristic of exploit kits is that they are typically designed to be agile, to move from server to server in order to avoid detection and blacklisting, so that it can be challenging to detect agile exploit kits using conventional methods.
The present inventors have recognized that conventional techniques for detecting security risks based on exploit kits have a number of problems and that improvements would be beneficial.