Demands by individuals, researchers, and enterprise for increased compute performance and storage capacity of computing devices have resulted in various computing technologies having been developed to address those demands. For example, compute intensive applications, such as enterprise cloud-based applications (e.g., software as a service (SaaS) applications), data mining applications, data-driven modeling applications, scientific computation problem solving applications, etc., typically rely on complex, large-scale computing environments, such as high-performance computing (HPC) environments and cloud computing environments, to execute the compute intensive applications, as well as store the voluminous amount of data. Such large-scale computing environments can include tens of thousands of multi-processor/multi-core computing devices connected via high-speed interconnects.
To carry out such processor intensive computations, various computing technologies have been implemented to distribute the workload, such as parallel computing, distributed computing, etc. To support the computing technologies, advancements in hardware have been introduced as well. For example, multiprocessor hardware architecture (e.g., multiple central processing units (CPUs) that share memory) has been developed to allow multiprocessing (e.g., coordinated, simultaneous processing by more than one processor, or CPU). In such multiprocessor hardware architectures, different parallel computer memory design architectures may be deployed: shared memory architecture (e.g., uniform memory access (UMA) and non-uniform memory access (NUMA) and distributed memory architecture. However, present technologies are generally optimized from a CPU perspective (e.g., to increase processor speed without increasing the load on the processor bus), not shared, high-speed I/O devices, such as network I/O devices.