The present invention relates to techniques for optimizing the performance of switch fabrics.
The performance of clusters of devices interconnected by a switch fabric (e.g., 10 Gibabit Ethernet clusters) is based on applications, libraries, processors, remote direct memory access (RDMA) interconnect, and fabric primitives. Of these, fabric primitives are the least optimized critical function in multi-core, 10 Gigabit Ethernet clusters. Fabric primitives are point-to-multipoint, and multipoint-to-point services used to coordinate parallel processing. While a tremendous amount of work has been invested into the development of parallel applications and libraries, multi-core processors, and RDMA interconnects, the acceleration of fabric primitives has been neglected by most new fabric technologies.