The invention relates to a method for operating a memory buffer system for the fast data transport over a communication network with a reliable transport protocol working in a point-to-multipoint data transfer mode in a multi-thread environment. Reliably here means that the data packets are positively acknowledged by the receiving station and optionally negatively acknowledged. The invention also relates to a correspondingly adapted apparatus for performing the method and a correspondingly adapted computer program product.
In high-performance network attached computer systems the implementation of data transport protocols is usually organized in a multi-threaded fashion. Hereby, the tasks of handling the sending of user data, and signaling data, the handling of receiving data and control information and the communication with a user application are assigned to different program threads or processes within the system.
Hereby, when an application starts data communication over a network system, it needs at least one thread for sending data (send handler), one thread for receiving of signaling data (receive handler) and an application thread (API-thread) that transports data between the user application and a communication protocol stack. The send handler is used to send data packets and control information to the network system. As soon as packet loss is detected, the send handler retransmits lost packets. A send buffer temporarily stores unacknowledged data, till an acknowledgment is delivered to the sender thread.
The task of the receive handler is receiving data from the network system and storing them into a receive buffer. In case of a packet loss detection, the receive handler informs the send handler, so it will send a control packet with loss data information to the sender thread of the communication pair. Since different tasks access shared data structures, those accesses must be synchronized. Conventionally, such synchronization is implemented by mutexes or semaphores.
While the send handler and the receive handler are function components, the send buffer and the receive buffer may be referred to as data components.
When the API-thread has to write data into the send buffer, it applies a mutex or a semaphore to block the simultaneously reading access from the send handler. If the send handler tries to read data from the send buffer for sending, it also applies a mutex or semaphore to prevent a simultaneous access attempt from the API-thread. The access from receive handler to receive buffer is similar to that of the send handler. Further details on the interplay of send handler-thread; receive handler-thread and API-thread are presented in
WO 2009/043712 A1 where it is referred to the explanation of the background of the invention and FIG. 1 in particular.
Since a semaphore is basically the same thing as a mutex, it is referred to mutex only in the following knowing that in replacement of a mutex a semaphore may be used as an alternative. What a mutex does is e.g. explained on a corresponding Wikipedia entry. There, it is found the following definition: In short, it is the requirement of mutual exclusion. In computer science, mutual exclusion refers to the requirement of ensuring that no two concurrent processes are in their critical section at the same time; it is a basic requirement in concurrency control, to prevent race conditions. Here, a critical section refers to a period when the process accesses a shared resource, such as shared memory.
The intensive use of mutexes for thread synchronization decreases the communication software stack performance rapidly, since simultaneous access to the data, leads to wait states for process threads. Moreover, it leads to a rapid increase of the number of necessary Kernel system calls for thread synchronization.
The problem of “meta storage” organization is a non-trivial problem for data transport system designers. With “meta storage” a structured computer memory is meant, which stores important entity-related information. Under “entity”, here it is understood a sent IP packet (Internet Protocol). The structure of such a meta storage which is used in this invention is designed to be used in reliable multi-Gigabit data transport protocols, working preferably in a point to multipoint fashion. One prominent example of a reliable data transport protocol which is used in the high speed data communication field, in particular over Internet is the UDT protocol, which is capable to provide a 10 Gigabit point-to-point data transmission over Internet links and features a congestion/flow control technique. UDT stands for UDP-based data transfer protocol and is primarily made for the implementation in the transport protocols which are suitable for the use in high-speed networks with data rates in the range of 10 Gbit/s. It was a pioneering technique in the industry of the high-speed data transport. The invention may however also be used in connection with the Transmission Control Protocol (TCP), but only in case of a significant protocol core adaptation due to the nature of the TCP protocol. Currently, it is not known how to implement a multi-Gigabit data transport protocol without employing a multithreaded approach. However, an essential problem of inter-thread communication is the shared access to certain areas of a computer memory. There are plenty of options existing to arrange a thread-safe access to a shared memory; however, almost all of them rely on locking of the regions of memory which are accessed concurrently from different threads. Such locks typically are implemented with mutexes and/or semaphores as above mentioned. For computer programs, a lock means the same as a traffic light at a crossroad. While one lane is in use, others must wait. This offers a great safety in conjunction with the probability of traffic jams which are avoided. It is the same for the computer system. It is not possible that a single thread makes a full time use of a buffer and prevents other threads from making use of the buffer. While it is good for a fair buffer management, locks are poison for the computer performance, since some of the arbitrating threads have to wait until the actually accessing thread unlocks the critical memory region.
In particular for the performance of a data transport protocol it may be critical since there are at least two concurring threads with which the meta-storage is accessed, one of which is a sending thread (sending user data onto to the network) and the other thread (receiving thread) is responsible for the reception of control information like ACKs be it in its general form or in the selective form SACK, and NACKs from one receiving station in case of point-to-point communication or from more than one receiving station in case of a point-to-multipoint communication and for processing of them.
It is obvious, that whenever two threads are working at the same memory area, coordination is required to avoid race conditions. Otherwise there is the danger of having conflicting memory accesses such that data which is to be read by the sending thread is already overwritten by the application thread which is delivering the data into the sending buffer or vice versa that data which is to be sent over the network is not even delivered to the buffer memory so that outdated data will be send instead.
Lock free data structures for data transport systems are proposed in prior art. Particularly, WO 2009/043712 A1 describes a memory buffer model for a data transport system, which is designed to work in a point-to-point mode.