A data communication network, or simply, data network, facilitates data transfers between two or more data processing systems. For example, an application executing in one data processing system acts as the sender of the data, and another application executing in another data processing system acts as the receiver of the data. Between the sender system and the receiver system, the data follows a data path that comprises one or more links between networking components, such as routers and switches.
In a data processing environment, such as in a datacenter, many data processing systems are connected via a data network. At any given time, several systems may be transmitting data of various sizes to several other systems. Many of these data transmissions can utilize a common link in the network, to get from their respective sender systems to their respective receiver systems.
A data communication link in a network can become congested when more than a threshold amount of data traffic tries to use the link during a given period. The data traffic of some data flows (hereinafter, “flow”, or “flows”) appears in bursts, causing the data traffic on a link to spike. A link can also be over-subscribed, i.e., too many flows may try to use the link at a given time. Packet loss, increased network latency, and timeouts are some examples of problems that are caused when the utilization of a link exceeds a threshold and congestion occurs.
Some flows in a network are small flows and some are large flows. A flow that transmits less than a threshold amount of data in a given period is a small flow. A flow that transmits the threshold amount of data or more in a given period is a large flow. The data of a flow comprises packets of data. Generally, the larger the flow, the more the number of the packets therein. The packets of the various flows wanting to use a link are queued.
In many datacenters, a sending system, a receiving system, or both can be virtual machines. A virtual machine (VM) comprises virtualized representations of real hardware, software, and firmware components available in a host data processing system. The data processing system can have any number of VMs configured thereon, and utilizing any number of virtualized components therein.
For example, the host may include a processor component. One virtual representation of the processor can be assigned to one VM, and another virtual representation of the same processor can be assigned to another VM, both VMs executing on the host. Furthermore, the second VM may also have access to a virtual representation of a reserve processor in the host and certain other resources, either exclusively or in a shared manner with the first VM.
Certain data processing systems are configured to process several workloads simultaneously. For example, separate virtual data processing systems, such as separate VMs, configured on a single host data processing system often process separate workloads for different clients or applications.
In large scale data processing environments, such as in a datacenter, thousands of VMs can be operating on a host at any given time, and hundreds if not thousands of such hosts may be operational in the datacenter at the time. A virtualized data processing environment such as the described datacenter is often referred to as a “cloud” that provides computing resources and computing services to several clients on an as-needed basis.
Congestion control is a process of limiting or reducing data congestion in a section of a network, such as at a networking device or in a link. Presently, congestion control is a function of the Transmission Control Protocol/Internet Protocol (TCP/IP) stack. The TCP/IP stack is implemented by an operating system, and different operating systems implement congestion control differently. For example, one operating system might use one algorithm for performing congestion control whereas a different operating system might implement a different algorithm for the same purpose. Even a single operating system can implement different congestion control algorithms, and the ones that are implemented can be configurable to exhibit different behaviors.
Generally, different congestion control algorithms can produce different congestion control effects. Often, different congestion control algorithms are designed to achieve different objectives. For example, one congestion control algorithm might be configured to produce an optimal user experience from a server-based service for a client application that is operating on a client system across a wide area network (WAN). Such an algorithm is geared for reducing congestion in the WAN traffic but not necessarily in the traffic that flows between two servers of the service provider on a local area network (LAN). Another congestion control algorithm might be configured to perform congestion control on the LAN traffic and not on the WAN traffic. Another congestion control algorithm might be configured to maximize the data transmission from a particular network interface card (NIC) for a particular application using that NIC. Many different configurations of congestion control algorithms exist, and many more are possible depending upon the circumstances.
When a tenant in a datacenter operates a VM on a server, the VM may be operating on the server with other VMs, the tenant may be collocated on the server with other tenants, or a combination thereof. The illustrative embodiments recognize that because congestion control is implemented by the operating system of each VM individually, potentially each VM can be configured to perform congestion control in a manner that is most suitable for that VM.
The illustrative embodiments further recognize that the congestion control needed to operate a datacenter's data network can be different from the type and/or amount of congestion control performed by a VM executing therein. Furthermore, because of the localized nature of the presently available congestion control, a datacenter operator may not even know the type or amount of congestion control performed by the VMs operating in the datacenter.
The illustrative embodiments further recognize that changes, updates, patches, and other modifications to the TCP/IP stack can affect the congestion control function implemented therein. Not every VM may apply a patch, perform an update, or make the changes to their TCP/IP stacks. In some cases, the life of a VM may not warrant the change, whereas in other cases, an administrator of the VM may be unaware of the change or may ignore the change.
As relates to congestion control, many tenants are concerned with user experience with the data traffic that travels on the datacenter network and crosses the datacenter boundary between servers inside the datacenter and client machines outside the datacenter (also known as North-South traffic). However, the illustrative embodiments recognize that the majority of data traffic flowing over the datacenter network is actually between data processing systems within the datacenter (also known as East-West traffic). Thus, here is an example reason why a datacenter's congestion control concerns might be different from a tenant's congestion control concerns, warranting different approaches to congestion control. Many other reasons and cases exist where a datacenter's congestion control concerns might be different from a tenant's congestion control concerns, requiring different congestion control methodology to be implemented at the datacenter-level than the methodology implemented in one or more VMs operating in the datacenter.
Given the present method of congestion control, where the congestion control function is performed and controlled by the VMs, performing congestion control at a datacenter-level to achieve a datacenter's congestion control objectives is very difficult, and in many cases impossible.
Thus, the illustrative embodiments recognize that a problem exists in performing datacenter-level congestion control. The illustrative embodiments recognize that a solution is needed for this problem where the solution operates in conjunction with a VM's congestion control mechanism; by observing the VM's congestion control operation, flow, or a combination thereof; with or without the knowledge of the VM that a networking device or system in the datacenter is also operating a congestion control function; or possesses some combination of these and other features as described herein.