This section is intended to provide a background or context to the invention disclosed below. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise explicitly indicated herein, what is described in this section is not prior art to the description in this application and is not admitted to be prior art by inclusion in this section.
Increasing numbers of network connected devices are appearing and supplanting some of the functions that have traditionally been provided by general purpose processors. For example, individual accelerators that provide parallel data operations as well as other functions are presently available. While such devices typically do not exhibit the full range of hardware and software found on most general purpose computers, they are often implemented in accordance with the same general architecture: a processing element, a form of memory, an application program, and a communication interface. The processing element is generally optimized for a particular operation, such as vector arithmetic and or parallel data processing and includes a specialized instruction processing pipeline. However, many of the performance-enhancing features of the device are underused due to difficulties with the communication interface. That is, the devices may be relatively inexpensive to engineer, but inefficient in terms of resource utilization and performance.
Field-programmable gate arrays (FPGAs) are frequently used in communications, data processing, data storage and other applications. The appealing characteristics of FPGAs are programmability for design flexibility. As compared to a conventional stored program processor arrangement, however, the re-programmability of an FPGA is less convenient. For example, to upgrade a program in a stored program processor arrangement, the operating system can be used to replace a program file. An FPGA, in contrast, generally requires special hardware to provide a configuration bitstream to the FPGA. Thus, specialized hardware and software to use the reconfigurable nature of the FPGA must be carefully designed.
Heterogeneous workloads are moving to cloud datacenters (DC). For improving overall DC power efficiency and the workload performance, these workloads are increasingly using hardware accelerators, such as FPGAs. These workloads are distributed and run at different scales. Therefore, to comply with the distributed nature of the applications in the DCs, mapping of large distributed applications on to multiple FPGAs is indispensable. The distributed application can be configured to distribute the computational workload to the FPGAs over the network switch. Multiple servers may also connect to a network switch, and the distributed application can take advantage of the computing resources available.
Further, even if an application fits in an expensive high-end FPGA, mapping that application on to multiple cheap low-end FPGAs leads to improvements to the infrastructure cost. By moving the application to use multiple FPGAs, it may be possible to use lower cost components to meet the design goals.
When building multi-FPGA systems, almost all of the related work focuses on inter-FPGA communication in a fixed topology, where a number of FPGAs are soldered on a board or connected in a predetermined topology in a network. In addition, there are multiple instances of work related to network-attached FPGAs, such as (i) Net-FPGA and (ii) Network-Attached FPGAs for Data Center Applications. In these network-attached FPGAs, Ethernet-based (mainly) network protocol stacks have been implemented. But, these network protocol stacks are not virtualized and do not provide security on top of it.
What is needed is a flexible, user defined topology that is adaptable to the application requirements.