1. Field of the Invention
The invention relates to automatic parallelization and to prefetching.
2. Background Art
Automatic parallelization has been studied and has been used commercially for a long time. Recent transactional memory hardware enables the possibility of speculative automatic parallelization. Speculative automatic parallelization will produce a parallel loop, even if the loop cannot be proven to be free of cross-loop dependencies at compile time. However, because of hardware limitations of transactional memory implementations, e.g., the number of stores cannot exceed a certain amount in order to avoid transaction failure, the workload for a speculative thread has to be small. In other words, the granularity of the parallel region cannot be too big.
At runtime of automatic parallelization, when a parallelized loop is encountered, a set of threads will be created or reused from previous creation. The original main thread will share the work with all other non-main threads. The non-main threads, however, often incur startup cost because the private cache and translation lookahead buffer (TLB) are not warmed up. Such startup cost is significant for speculative automatic parallelization because, as mentioned earlier, speculative automatic parallelization has small work granularity. The relatively large startup cost also negatively affects overall performance.