Web, business, and scientific programs have increasingly become data bound. Many modern programs use remote data sources, such as remote databases and web services, and spend a significant amount of their running time waiting for remote data to be fetched. They issue large numbers of long latency data access requests—long latency because the data is often served by remote web services or databases. Owing to the disparity between central processing unit (CPU) speeds and network latencies and bandwidths, these programs spend a significant fraction of their execution time waiting for the data access requests to be serviced.
To improve the performance of such programs, programmers expend a lot of time and effort scheduling the requests in a way that minimizes the overall execution time using schemes such as batching and parallelization. Batching refers to converting several round trips into one, and thereby amortizing the round trip cost over more data. Related remote data access calls are not performed at the point the client requests them, but are instead deferred until the client actually needs the value of a result. By that time, a number of deferred calls may have accumulated and the calls are sent all at once, in a “batch”. Parallelization exposes independent remote data accesses and overlaps their round trip latencies. In both mechanisms, it usually requires significant code rewriting, thereby obscuring the functional logic of the program, and often results in non-portable performance gains. Ideally, the programmer should only be concerned with expressing the functional logic of the program, and allow the compiler and runtime to orchestrate the remote data requests efficiently.
A conventional way to overcome the problem of data access latency is data prefetching (See, e.g., K. S. Trivedi. On the paging performance of array algorithms. IEEE Trans. Comput., 26(10):938-947, 1977; T. C. Mowry, A. K. Demke, and O. Krieger. Automatic compiler-inserted I/O prefetching for out-of-core applications. In OSDI, 1996]. The idea is to issue asynchronous data requests before the data is really needed so that the data may be available locally when accessed by the program. Prefetching has been studied in the microarchitecture community to hide the latency between the processing core and the memory subsystem (See, e.g., W. Zhang, D. M. Tullsen, and B. Calder. Accelerating and adapting precomputation threads for efficient prefetching. In HPCA, 2007; D. Kim and D. Yeung. Design and evaluation of compiler algorithms for pre-execution. In ASPLOS, 2002; J. D. Collins, D. M. Tullsen, H. Wang, and J. P. Shen. Dynamic speculative precomputation. In MICRO, 2001; C.-K. Luk. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. SIGARCH Comput. Archit. News, 29(2):40-51, 2001). Prefetching has also been used to hide the latency of a local filesystem (See, e.g., D. Kotz and C. S. Ellis. Practical prefetching techniques for parallel file systems. In PDIS, 1991; F. Chang and G. A. Gibson. Automatic I/O hint generation through speculative execution. In OSDI, 1999]. The inventors have recognized, however, that prefetching is not used for hiding the latency of remote data accesses, including network and remote data storage latencies.
Most prefetchers are history-based: they analyze data access patterns performed in the past, predict future data accesses to follow similar patterns, and prefetch the corresponding data. While this approach works for programs with regular data access patterns, such as array-based scientific programs, it is not effective for programs whose data accesses depend on the input, are not structured in easily predicted patterns, or do not contain recurrences (that is, frequent reuse of the same remote data).
While using speculative execution allows programs to dynamically discover future read accesses to disk, in the presence of dependencies between accesses, such approach often causes misspeculation of future disk accesses, and spurious disk accesses. Speculative parallelization schemes offer hope for solving the problem of excessive misspeculation, however, they have some disadvantages in the setting of remote data accesses. In particular, violation of dependencies that do not contribute to generating remote data requests may cause re-execution, thus re-executing some expensive remote access and hindering progress towards exposing other remote requests.