1. Field
This invention relates to network layer optimization through the use of network accelerator devices (NAD), and particularly to methods, systems and computer program products for enabling reliable packet transmission in a network using a set of network accelerator devices attached to a switch port.
2. Description
High performance computing (HPC) systems are increasingly being deployed in life-critical and mission-critical usage scenarios in addition to traditional scientific computing applications. Computational steering is known in the art and widely deployed to measure HPC application run-time state using software abstractions called ‘sensors’ and steering the computation state using software ‘actuators’ to achieve necessary qualities of service or computational convergence. Data input to software ‘actuators’ can be directly from files, sensor inputs or user input from a graphical visualization screen. Wireless handhelds, appliances and thin clients are increasingly being used in addition to traditional high-end graphics workstations, for the purposes of visualizing and steering computational workloads. HPC applications also consume data from the environment using hardware sensors and can also actuate physical hardware using hardware actuators.
Reliable packet transmission is necessary for data-critical traffic. High-performance computing, distributed database servers, cluster computers and web servers are applications where lossless flow of traffic from one compute node to the other is necessary for application functionality. Additionally, such systems are used in mission critical and life critical applications where reliability is of utmost concern. Data loss can happen because of communication link errors or packet drops in switches with congested links. In large diameter networks, the need for packet retransmissions can significantly increase data transfer time, because in lossless networks unless all the data in a given dataset are received in order, the next stage of the computation cannot proceed. Also, given that links can become congested during application operation, packet retransmissions can be considerably delayed.