With the rapid growth of cloud-based services, the complexity and monetary importance of managing the security of large scale data centers has become a challenge that can no longer rely solely on human inspection and trial and error. To help illustrate this point, a conventional platform that provides large scale public cloud-based services will now be described. The platform provides on-demand computing, storage, and networking resources to mutually distrusting customers. Infrastructure services and customer services provided by the platform are hosted in custom isolation boundaries using network connectivity restrictions. For example, platform management service interfaces are walled off from the Internet and arbitrary customer access. In addition, customer services are also isolated from one another. These restrictions are enforced in network devices such as routers and top-of-rack switches, hypervisor packet filters, and firewalls.
Errors in the enforcement of these restrictions may compromise the security and availability of the platform. For example, an error in the ingress filtering access control list (ACL) for traffic coming from the Internet can cause a connectivity outage to customers. Another potential problem is that management ports on routers and other critical infrastructure may be exposed as a result of configuring overly permissive rules when intending to open the ports only for selected services. If a management service is exposed to the Internet, then it becomes an attractive target for zero-day and distributed denial of service (DDoS) attacks. The opposite problem of accidentally blocking useful ports, such as the User Datagram Protocol (UDP) port for Domain Name System (DNS), is equally possible when blanket policies are added that block traffic.
The frequency with which the network connectivity restrictions are changed adds an important dimension to the problem. Some of these settings may be updated dynamically. For example, whenever a virtual machine (VM) is instantiated or moved from one host to another, a hypervisor or virtual machine manager (VMM) may dynamically configure the appropriate packet filters for the VM. Additionally, administrators may frequently perform out-of-band updates to the network connectivity restrictions for debugging connectivity issues or in situations such as disaster recovery. The complexity of managing many different address ranges that represent different services adds a further dimension of complexity to the problem. In summary, these network connectivity restrictions are subject to change in many ways, and the access requirements can interact in several ways. As a result, manual inspection and maintenance of network connectivity restrictions is not a viable option. Instead, automated validation methods are needed that can be used to ensure that that the intended network connectivity model is always preserved with respect to both security and availability.