Cloud services are growing rapidly—the market for cloud services is expected to reach several hundred billion dollars in the relatively near future. Cloud services are hosted on a data center or group of data centers, wherein a data center includes numerous computing devices and network infrastructure devices that support compute, networking and storage services. Devices in data centers, and services hosted in those devices, however, are unfortunately increasingly becoming a target for cyber-attacks. Data centers have become targets of cyber-attackers for at least two reasons: 1) a data center or network of data centers may host thousands to tens of thousands of different services, so attacking a data center can cause significant and sometimes spectacular collateral damage; 2) attackers can utilize compromised devices in a data center to launch outbound attacks, in addition to hosting malware, stealing confidential data, disrupting a competitors service, and selling compromised virtual machines (VMs) in an underground economy. In particular, attackers have been known to use VMs executing in data center devices to deploy bot nets, exploit kits, to detect vulnerabilities, send spam, launch Denial of Service (DoS) attacks to other sites, etc.
Conventionally, a variety of approaches have been employed to detect attacks on infrastructure of data centers. For example, to detect incoming attacks, data center operators have adopted a defense-in-depth approach by deploying, 1) commercial hardware devices (e.g., firewalls, Intrusion Detection Systems (IDS), distributed DoS (DDoS) protection appliances, etc.) at the network level; and 2) proprietary software (e.g., host-based IDS, anti-malware) at the host level. The above-mentioned hardware devices analyze inbound traffic to protect against a variety of attacks, such as Transmission Control Protocol (TCP) SYN flood attacks, TCP null attacks, User Datagram Protocol (UDP) flood attacks, and UDP fragment misuse. To block unwanted traffic, data center operators utilize a combination of mitigation mechanisms, such as Access Control Lists (ACLs), blacklists and whitelists, rate limiters, and/or traffic redirection to scrubbers for deep packet inspection (DPI) (e.g. malware detection). Other hardware devices, such as load balancers, aid detection by dropping traffic destined to blocked ports and IP addresses. To protect against application-level attacks, tenants (e.g., computer-executable applications hosted by host servers or VMs in the data center) typically install end host-based solutions for attack detection on their respective VMs. These software solutions periodically download the latest threat signatures and scan applications executing in the VMs for compromises. Diagnostic information, such as logs and anti-malware events are also typically logged for post-mortem analysis.
To prevent outbound attacks, the hypervisor layer in the host servers is configured to prevent spoofing of a source address (e.g., a source Internet Protocol (IP) address) in outbound traffic, and is further typically configured to cap outbound bandwidth per VM instantiated in the host servers. Similarly, access control rules can be set up to rate limit or block ports that VMs are not supposed to use. Finally, (relatively expensive) hardware devices can be configured to mitigate outbound anomalies, similar to prevention of inbound anomalies described above.
While many of these approaches are relevant to data center defense (such as end host filtering and hypervisor controls), the hardware devices are inadequate for deployment at the Cloud scale (e.g., over a data center or multiple data centers in communication with one another) for at least three reasons. First, the above-referenced hardware devices introduce unfavorable cost versus capacity trade-offs. In particular, these hardware devices can cost anywhere between hundreds of thousands to millions of dollars per device, but the amount of data that can be handled per hardware device is relatively limited. These hardware devices have been found to fail under both network layer and application layer DDoS attacks. Accordingly, to handle traffic volume flowing through and across data centers, and to handle increasingly high-volume DoS attacks, utilization of the hardware devices described above would incur significant costs. Further, these devices must be deployed in a redundant manner, further increasing procurement and operational costs.
Second, the hardware devices are relatively inflexible. This is because the devices run proprietary software, thereby limiting how operators can configure them to handle the increasing diversity of cyber-attacks. Given a lack of rich programming interfaces, operators are forced to specify and manage a large number of policies themselves for controlling traffic (e.g., set thresholds for different protocols, ports, cluster virtual IP addresses (VIPs) at different time granularities, etc.) Also, the hardware devices have limited effectiveness against increasingly sophisticated attacks, such as zero-day attacks. Finally, the hardware devices may not be kept up-to-date with operating system (OS) firmware and builds, which risks reducing their effectiveness against attacks.
Third, collateral damage may be associated with such hardware devices. Since many attacks can ramp up in tens of seconds to a few minutes, a latency in detecting an anomaly or attack risks overloading target VMs, as well as the infrastructure of data centers (e.g., firewalls, load balancers and core links), which may cause collateral damage to co-hosted tenants. Still further, if the hardware devices are unable to quickly identify when an attack has subsided, legitimate traffic may be mistakenly blocked. Accordingly, given that many security solutions apply traffic profiling and smoothing techniques to reduce false positives for attack detection, such solutions may not be able to act fast enough to avoid collateral damage.