The present invention relates to data center infrastructure and operation, and more particularly, this invention relates to arbitrating multiple-thousands of flows at 100G speed and faster. Current data centers include many thousands of digital appliances, each digital appliance being capable of processing and storing massive amounts of data. When seen in isolation, these appliances are not always superior to what users may have at home. However, the confinement of many of these digital appliances within a small physical area, and the large-data application that they can collectively engage into, makes data centers particularly interesting.
Data center networks are playing a critical compounding function in data centers. In a somewhat still turbulent field, there have been many recent proposals to reshape current data center networks so that they are more capable of successfully contending with stringent, and in some cases even divergent, requirements. Many of these proposals focus on management, transport, or network level protocols, targeting better exploitation of the existing infrastructure by the applications of data center tenants.
At the same time, the hardware of a typical data center network has also changed in ways that may radically modify the landscape. Intelligent network interfaces attached to (or coexisting with) processing cores, which are capable of providing low-latency and/or high-bandwidth pathways to remote processes, has been a long sought goal in data center infrastructure development. FIG. 3 shows an illustration of scheduling in a network interface, based on urgency, a connection's window, a tenant's subscription, and/or other applicable criteria.
In addition, large switching fabrics that utilize convergence enhanced Ethernet that are capable of providing homogeneous quality-of-service guarantees, are able to seamlessly unify large numbers of distributed resources. The scheduling issues which arise due to the complexity of the network are shown in FIG. 4. These types of large, complex switching fabrics are another anticipated step forward to address the issues with conventional data center networks.
Switches and network interfaces with 40G Ethernet ports are now becoming available, while the industry is preparing for 100G Ethernet capability. Lessons learned from years of data center usage and construction indicate that bandwidth is rarely in excess. Although there are probably only a few processes today that are capable of saturating a 100G port or link, this may not be the case in the near future. In addition, in a multi-virtual machine (VM), multi-tenant data center environment, any link may easily become congested. Thus, it is traditionally agreed upon that the network should be able to slice its capacity in order to enable isolated, well-secured services to users, a notion that is even more applicable in current data centers and those of the future.
However, scheduling becomes extremely challenging with increased network size and line speed. For example, the time needed for a 64B Ethernet frame to be processed on a 100G line is just 6.6 ns, which means that decisions on where and how to process flows need to be made extremely quickly in order to manage the bandwidth effectively. At the same time, in data center and warehouse scale computers, the number of requestors that a scheduler may be required to arbitrate requests from may range in magnitude from a few tens (e.g., in a small scale switch) up to several tens of thousands or more (e.g., in a large switching fabric or in large network interfacing).
To accommodate many different flows, conventional systems have made use of algorithms which are adapted to determine which flow has priority over other flows provided to the system based on service-weights of the individual flows. The priority flows are processed first, while the lower prioritized flows wait in queue. One of the problems with this conventional scheduling is that this allocation of bandwidth results in bursts of the priority flows being processed, followed by another flow, then another, and the resulting allocation contains a series of bursts of various flows, which is undesirable.