In large-scale training clusters, communication is a module of great importance, especially in large-scale deep learning training tasks, to obtain better model parameters and accelerate model convergence, frequent communication is necessary. Communication is one of the bottlenecks of the training speed. At present, in highly integrated training systems and supercomputing centers, an Infiniband (IB) architecture is commonly used to accelerate communication. In IB, dedicated hardware is employed to simplify protocol stacks, and most of the work in which CPUs are originally required to participate for memory sharing between two computers is directly completed by IB hardware.