Recently there has been an increase in the use of data centers as a venue for vendors to provide computing resources to customers. Also referred to as clouds, compute clusters, grids, and other terms, these types of data centers typically include large numbers of server computers (nodes) hosting data and applications (typically, network services or virtual machines) for the customers. Often a server may simultaneously host applications of different customers. While technologies such as virtualization exist to manage sharing of computation resources in a cloud, there are shortcomings with previous technologies that manage the use of the underlying data network. In particular, there has been a lack of effective regulation of network use that can prevent a tenant from obtaining disproportionate use of the network and yet provide reasonable network performance. In the presence of greedy or malicious tenants, other tenants may be subject to unpredictable performance and denial-of-service attacks. Even when clients are well-behaved, natural variations in their workload cause the network to be divided arbitrarily and not as per what the cluster provider may intend.
Some approaches have relied on TCP's (transmission control protocol) congestion control. However, a customer application can achieve unbounded utilization of the network by using many TCP flows, variations of TCP, protocols such as UDP (user datagram protocol) that do not respond to congestion control, or by bypassing congestion control in guest VMs (virtual machines). Another approach imposes static traffic limits on traffic sent to and from each VM. In spite of this a malicious user can render a target service or VM or rack of servers unreachable by placing a trojan receiver on the target and using a few other VMs to transmit full rate UDP flows to the trojan VM, thereby overflowing the host server's bandwidth, the downlinks of the server's rack, etc. In either case, the victim VMs that happen to be co-located on the server or rack may become compromised.
Generally, there has not been any way by which network use in a cloud can be allocated and regulated above the network level (e.g. at the granularity of traffic sourcing entities such as tenants, applications, services, etc.) and in a way that reliably prevents disproportionate bandwidth consumption. Scalable techniques for network performance isolation that are robust in the presence of churn without impacting performance are discussed below.