Central Processing Units (CPUs) have been gaining increasing performance capability exponentially over the past 40 years, in accordance with Moore's Law. CPUs are not only growing faster and faster, but also are applied to increasing number of applications, such as logic based computation, integer and floating-point arithmetic, string processing, multimedia processing, encryption, and error correction. CPUs also contain a large number of transistors dedicated to alleviate common performance bottlenecks, such as slow memory fetches and frequent code branches.
Consequently, modern CPUs are quite adequate for a diverse set of workloads. But this trend comes at a cost, since the total amount of silicon components in a CPU is limited by thermal and economic constraints. Instead of requiring CPUs to handle all workloads, some workloads are better served by less general, more specific processors.
One example is graphic processor unit (GPU). GPUs were popularized by the commoditization of discrete graphics cards for higher graphics performance in workloads such as computer games, media creation, and computer-aided design. GPUs are specialized processors designed to process relatively few tasks involved in computer graphics in a very efficient way. But the new application of GPUs has been recently discovered and expanded. There is an entire class of non-graphics computation that can exploit these specialized functions of GPUs. Particularly, now GPUs can handle highly-parallel numeric codes for many scientific programs. New software libraries such as CUDA and OpenCL emerged to facilitate the use of this specialized hardware for codes that were originally designed for CPUs. The success of this model is evident in the fact that some of the fastest computers in the world now use these so called general-purpose GPUs (GPGPUs) as numerical accelerators.
However, general-purpose GPUs still have limitation for general-purpose computing. The hardware of GPUs are generally designed and optimized for floating point calculation. Thus GPUs offer little advantage over CPUs for computing tasks focusing on integer point calculation. Also due to the architectural nature of GPUs, GPUs gain performance advantage by parallelizing the computation. Yet not all computing tasks can be parallelized efficiently, this severely limits the application of general-purpose GPUs. Further, GPUs are typically fabricated on graphical expansion card. It is not efficient to use GPUs for certain tasks including high speed network data processing, since a large amount of data needs to transfer among network interface controllers (NICs), CPUs and GPUs.
With the growth of the Internet in terms of transmission speeds, it is common nowadays that a datacenter server or a desktop computer needs to process network transmitted data at a transmission speed over 1 gigabit per second. Processing the incoming network segments such as TCP or UDP can further pose an overhead burden on CPUs. As a result, a significant amount of the processing power of a CPU is dedicated to processing the network transmission, instead of running intended applications.
Some modern network interface controllers can offload the burden on the CPU by doing limited processes on the network transmitted data using dedicated hardware fabricated on the network interface controller (NIC). For example, some NICs include predetermined features such as TCP/IP offloading, encryption, and error correction that can relieve the CPU of the burden of computing these common (albeit limited) tasks. Nevertheless, the functionalities of the network interface controllers are fixed by the predetermined dedicated hardware. There is no mechanism for these network interface controllers to perform any general purpose computing tasks other than the pre-supplied functionalities