Data centers may include several hundred or several thousand servers interconnected by high-speed switches. Cloud data centers host diverse applications, mixing in the same network many workflows that require small, predictable latency with others requiring large, sustained throughput. In recent years, data centers have transformed computing, with large scale consolidation of enterprise IT into data center hubs, and with the emergence of cloud computing service providers. A consistent theme in data center design has been to build highly available, high performance computing and storage infrastructure using low cost, commodity components. In particular, low-cost switches are common, providing up to 48 ports at 1 Gbps, at a price under $2,000. Several recent research proposals envision creating economical, easy-to-manage data centers using novel architectures built on such commodity switches.
Whether these proposals are realistic depends in large part on how well the commodity switches handle the traffic of real data center applications. It has been discovered that soft real-time applications, such as web search, retail, advertising, and recommendation systems that have driven much of the data center construction, generate a diverse mix of short flows and long flows. These applications require the following from the data center network: low latency for short flows, high burst tolerance, and high utilization for long flows.
The first two requirements stem from the Partition/Aggregate workflow pattern that many of these applications use. The soft real-time deadlines for end results translate into latency targets for the individual tasks in the workflow. These latency targets vary from about 10 ms to about 100 ms, and tasks not completed before their deadlines are cancelled, thereby adversely affecting the final result. Thus, application requirements for low latency directly impact the quality of the result returned and thus revenue. Reducing network latency allows application developers to shift more cycles to the algorithms that improve relevance and end user experience.
The third requirement, high utilization for large flows, stems from the need to continuously update internal data structures of these applications, as the freshness of this data also affects the quality of results. High throughput for long flows that update data is thus as essential as low latency and burst tolerance.
In this environment, today's state of the art TCP protocol falls short. Accordingly, there is a need for improved methods and apparatus for efficient packet transport in computer networks, such as data centers.