FIG. 1 depicts an exemplary Content Delivery Network (CDN) architecture. As shown, the CDN includes several different caching Points-of-Presence (PoPs) 110, traffic management servers 120, and an administrative server 130. The figure also illustrates the interactions that CDN customers, including content providers, have with the CDN and interactions that content consumers or end users have with the CDN.
Each PoP 110 may be representative of a server farm for a geographically proximate set of physically separate servers or a set of virtual servers that execute over partitioned sets of resources of one or more physically separate servers. The PoPs are distributed across different network edges of the Internet. The servers in each respective PoP cache and serve content on behalf of different content providers to end users, thus facilitating the “last mile” delivery of content. Hence, the PoP servers are referred to as “edge servers” or “caching servers”. An edge server may cache the same content as other edge servers in the same PoP or may be configured to cache different content than the other edge servers in the same PoP.
The traffic management servers 120 route end users, and more specifically, end user issued requests for content to one or more edge servers that can optimally deliver the requested content back to the end users. In many cases, the optimal edge server is a server caching the requested content in a PoP that is geographically closest to the end user that issued the content request. Different CDN implementations utilize different traffic management schemes to achieve such routing to the optimal edge servers. For example, the traffic management scheme can be conducted according to Anycast routing. However, it should be apparent that other traffic management schemes, such as Domain Name System (DNS) routing, can alternatively be used and that the traffic management servers 120 can include different combinations of DNS servers, load balancers, and routers performing Anycast, DNS, or Border Gateway Protocol (BGP) routing as some examples.
The administrative server 130 may include a central server of the CDN or a distributed set of interoperating servers that perform the configuration control and reporting functionality of the CDN. Content providers register with the administrative server 130 in order to access services and functionality of the CDN. Accordingly, content providers are also referred to as customers of the CDN. Once registered, content providers can interface with the administrative server 130 to specify a configuration, upload content, and set security parameters. The administrative server 130 also aggregates statistics data from each server of the set of edge servers and processes the statistics to produce usage and performance reports for the customers. From these reports, the content provider can better understand the demand for its content, the performance provided by the CDN in delivering the content provider's content, and the need for capacity reallocation, among other uses.
CDNs, like any online entity, can be a target for cyber-attacks. Cyber-attacks can have many incarnations. Some examples include masking and passing of virus-embedded code or content, Distributed Denial of Service (DDOS) attacks, account hacking attacks, cross-site scripting attacks, and SQL injection attacks.
The ramifications of a cyber-attack can be more severe if successful on the CDN as they can have trickle-down consequences. Specifically, any attack that is intended for one of the CDN content provider customers can degrade the CDN performance for other content provider customers. This is because an attack that is intended for one CDN content provider customer will usually find its way to the CDN's servers that deliver other customer content. Consequently, if the attack is successful, it will not only take down or degrade the performance of the intended content provider site, but sites of other content providers that rely on the same CDN resources under attack.
One common counter-measure to cyber-attacks is the firewall. Firewalls typically operate by way of a set of rules. These rules can be expressed as regular expressions or through other syntax. The function of the rules is to identify malicious data and the function of the firewall is to use the rules to prevent such malicious data from passing through the firewall thereby preventing the malicious data from affecting the systems that would execute or otherwise process the malicious data. The malicious data can be in the form of code, text, scripts, files, or multi-media content (e.g., audio, video, images) as some examples. Accordingly, a firewall typically operates to identify and restrict “black-listed” data.
However, rule-based black-list firewalls remain vulnerable, especially in their application towards a CDN. A rule has to be configured for each attack that is to be thwarted. New attacks are invented everyday and such new attacks will be successful until identified and a rule is configured to combat those new attacks. Accordingly, the attackers are typically one step ahead of the firewall. Also, attackers can modify their attacks to work-around firewall rules. For example, a firewall rule may be configured for a specific variant or permutation of a known attack. Once the attacker becomes aware of how the firewall operates, he can attack using a different variant or permutation of the attack that does not fall within the rule definition and is therefore not restricted by the firewall. In other words, unless a firewall rule is explicitly set to combat a particular attack, that particular attack will pass through the firewall undetected.
This cat-and-mouse dynamic is made worse for the CDN, because the CDN is responsible for the content of numerous content provider customers and must defend all such customers from attacks. In other words, the CDN is subject to a higher volume of attacks simply because it must defend a greater number of resources that are attack targets.
Another shortcoming of black-list firewalls is the sheer number of rules that are needed to account for all known types of malicious data. The list of rules is thus an ever growing list. Consequently, it is not uncommon for a firewall to be configured with hundreds, if not, thousands of rules. Each piece of data passing through the firewall is thus subject to each of the defined rules. For every new rule that is configured, additional processing overhead is added at the firewall with each unit of overhead degrading performance. This issue is again exacerbated for the CDN. A CDN firewall must be able to implement and process many different rule sets. Each rule set can be defined by a different content provider customer with each content provider customer rule set specifying hundreds or thousands of custom rules for attacks that the particular customer is at risk for. Accordingly, data passing through the CDN can be subject to all customer rule sets. Alternatively, data passing through the CDN can be subject to a specific customer's rule set that is selected from all customer rule sets, wherein selecting the specific customer's rule set involves inspecting the data to identify the customer to which it pertains and then retrieving that specific customer's rule set. Another alternative is to specify a default rule set for all CDN customers. However, such a rule set needs to be extensive and comprehensive, and, in many cases, can impose unnecessary overhead for various content provider customers. For example, the MODSecurity module is an open source web application firewall providing over 15,000 configurable rules.
Accordingly, there is a need for a scalable CDN firewall solution that combats some new threats without the need for a specific rule definition for those threats. There is also a need for a firewall solution that does not increase in overhead or complexity as new threats are identified and combated, but one that requires a constant amount of resources to operate and is therefore adapted for the large volume of data and attacks that a CDN experiences. In summary, there is a need to restrict malicious data via means other than configured black-lists.