In some communication networks, a network node processes data received over the network using a local accelerator. Various methods for delivering data to the accelerator are known in the art. For example, PCT International Publication WO 2013/180691, whose disclosure is incorporated herein by reference, describes devices coupled via one or more interconnects. In one embodiment, a Network Interface Card (NIC), such as a Remote Direct Memory Access (RDMA) capable NIC, transfers data directly into or out of the memory of a peer device that is coupled to the NIC via one or more interconnects, bypassing a host computing and processing unit, a main system memory or both.
PCT International Publication WO 2013/136355, whose disclosure is incorporated herein by reference, describes a network node that performs parallel calculations on a multi-core GPU. The node comprises a host and a host memory on which a calculation application can be installed, a GPU with a GPU memory, a bus and a Network Interface Card (NIC). The NIC comprises means for receiving data from the GPU memory and metadata from the host over the bus, and for routing the data and metadata towards the network. The NIC further comprises means for receiving data from the network and for providing the data to the GPU memory over the bus. The NIC thus realizes a direct data path between the GPU memory and the network, without passing the data through the host memory.