Currently, network access is done by writing a descriptor to system memory and informing the IO device that work is ready. Typical implementations include a computer's central processing unit (CPU) writing to the computer's local memory, and then informing a locally attached IO device. The IO device then will fetch the work descriptor (command buffer), perform an actual IO operation (for example, reading a remote memory location over the network), and report completion of the command. The CPU needs to be aware of completion of the written command, typically by either the CPU polling on a completion status or the CPU being interrupted using an interrupt message (for example MSI-X).
This conventional process for network IO access, where network access is done in an asynchronous manner, is beneficial were latency can be tolerated. In this case, the CPU can continue working while the network access is done in the background.
In some cases network IO access (the network phase) cannot overlap the compute phase. In other words, the CPU cannot continue working until the CPU receives (or transmits) the subject data. In these cases, the inherent asynchronous nature of operation adds latency to processing but does not provide any benefit.