In the Internet, browsers send HTTP requests to web servers via GET or POST methods, and render the HTTP response content returned from the server. The HTTP response may contain embedded objects such as JavaScript files, CSS links, images, embedded frames, etc. These embedded objects may trigger additional HTTP requests, which are submitted by the browser until the complete webpage is rendered. Throughout this disclosure, HTTP request and HTTP response may be referred to simply as request and response, respectively, unless explicitly stated otherwise. These rendering practices provide opportunities for malicious attackers to inject their malicious script as part of popular websites. As a result, the malicious script is fetched by the browser as part of the response content, and the malicious script starts redirecting the user to other servers/hosts controlled by the attacker. This is done, on one hand, to hide the main server hosting the malicious script or binary, and on the other hand, to use cloaking techniques to decide which clients to attack based on client characteristics. As a result, the malicious status of infected hosts is often hidden and the HTTP flows containing malicious redirections are not immediately detectable.
Existing techniques for detecting these attacks typically rely on actively crawling the Internet to search for potential malware distribution sites. Unfortunately, such strategies are susceptible to cloaking or anti-emulation mechanisms employed by attackers. One key common characteristics of these types of attacks are that they rely on HTTP redirections, where users' requests are automatically re-directed through a series of intermediate websites (e.g., intermediate node (107)), before landing on the final malware distribution site (e.g., 109). Frequently, the attackers will employ a number of different domain names and web servers to obfuscate the malicious websites. The automatic HTTP redirections may be performed either via HTTP 3xx redirections issued by the intermediate web servers, or dynamically executed scripts embedded in the webpage content. The goal of the multiple HTTP redirections in a sequence is to make it more difficult for security analysts to detect the malicious servers.