1. Field
The present embodiments relate to techniques for parallelizing computer code. More specifically, the present embodiments relate to a method and system for parallelizing loops with read-after-write (RAW) dependencies.
2. Related Art
Computer system designers are presently developing mechanisms to support multi-threading within the latest generation of Chip-Multiprocessors (CMPs) as well as more traditional Symmetric Multiprocessors (SMPs). With proper hardware support, multi-threading can dramatically increase computational performance. In particular, faster execution times may be achieved by concurrently executing portions of computer programs on multiple processors of a computer system. Furthermore, concurrent execution of sequential computer programs may be enabled using automatic parallelization techniques that convert the sequential computer programs into multi-threaded code.
Automatic parallelization of sequential computer programs may be accomplished in a number of ways. First, a parallelizing compiler may convert high-level sequential code for a computer program into multi-threaded binary code. For example, the parallelizing compiler may split up a loop in the computer program so that the loop's iterations may be concurrently executed on separate processors. Parallelization may also be provided by a virtual machine that parses high-level bytecode and spawns multiple threads to execute portions of the bytecode in parallel. For example, a Java (Java™ is a registered trademark of Sun Microsystems, Inc.) Virtual Machine may concurrently execute portions of a Java program on multiple processors in a computer system. Finally, parallelizing mechanisms may exist in software that analyzes compiled (e.g., binary or machine) code and identifies portions of the code that may be executed in parallel.
However, portions of computer programs with certain types of read-after-write (RAW) dependencies may not be parallelizable using current parallelization techniques. In general, a loop that calculates a reduction of each prefix of an ordered set and uses each reduction in other calculations may not be parallelized due to RAW hazards associated with the reduction variable. For example, a loop that calculates a running sum and uses the intermediate partial sums in other calculations may not be parallelized due to the RAW hazard associated with the running sum variable. Consequently, loops containing such reduction operations may not fully utilize the parallel execution capabilities of CMPs or SMPs.
Hence, what is needed is a mechanism for increasing parallelization in computer programs with RAW dependencies.