The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Moving from one level to another of a shared library using a multi-level software stack, such as the Message Passing Interface (MPI) library, may cost CPU cycles, especially when required data is not in the CPU's cache. This may be particularly the case for small message transfer path performance which is sensitive to memory access latency. Existing schemes for prefetching typically operate on one particular level of a multi-level library by prefetching a far part of data while working on a closer part of the data from the same level. This technique does not function optimally for small messages.