The present invention relates to the execution of computer programs in parallel on multiple processors and in particular to a system controlling parallelization of computer programs.
Improvements in software performance have been realized by improved processor designs, for example, faster clock speeds, multiple instruction issue, and speculative execution techniques. Such performance improvements have the advantage of being completely transparent to the program generator (for example, a human programmer, compiler, or other program translator). However, achieving these benefits depends on the continuing availability of improved processors.
Parallelization offers another avenue for software performance improvement by dividing the execution of a software program amongst multiple processors that can run simultaneously. As more performance is required, more processors may be added to the system, ideally resulting in attendant performance improvement. Computer manufacturers have turned to designing processors composed of multiple cores, each core comprising circuitry (e.g., a CPU) necessary to independently perform arithmetic and logical operations. In many cases, the cores also support multiple execution contexts, allowing more than one program to run simultaneously on a single core (these cores are often referred to as multi-threaded cores and should not be confused with the software programming technique of multi-threading). The term “processor” as used herein will generally refer to an execution context of a core.
A core is typically associated with a cache and an interconnection network allowing the sharing of common memory among the cores; however, other “shared memory” architectures may be used, for example those providing exclusive memories for each processor with a communication structure. These multi-core processors often implement a multiprocessor on a single chip and multiple chips of multi-core processors are typically used to build a larger multiprocessor computer. Due to the shift toward multi-core processors, parallelization is supplanting improved single processor performance as the primary method for improving software performance.
Improved execution speed of a program using a multiprocessor computer depends on the ability to divide a program into portions that may be executed in parallel on the different processors. Parallel execution in this context requires identifying portions of the program that are independent such that they do not simultaneously operate on the same data. Of principal concern are portions of the program that may write to the same data, “write-write” dependency, and portions of the program that may implement a reading of data subsequent to a writing of that data, “read-write” dependency, or a writing of data subsequent to a reading of the data, “write-read” dependency. Errors can result if any of these reads and writes change in order as a result of parallel execution.
Some computer programs are relatively simple to execute in parallel, for example those which have portions which can be ensured to always operate on completely disjoint data sets, for example as occurs in some server applications and types of scientific computation. During execution, these different portions may be assigned to different queues for different processors by a master thread evaluating the relative work load of each processor and pending program threads.
A broader class of programs cannot be divided into portions statically known to operate on disjoint data. Many current programs are written using a sequential programming model, expressed as a series of steps operating on data. This model provides a simple, intuitive programming interface because, at each step, the generator of the program (for example, the programmer, compiler, and/or some other form of translator) can assume the previous steps have been completed and the results are available for use. However, the implicit dependence between each step obscures possible independence among instructions needed for parallel execution. To statically parallelize a program written using the sequential programming model, the program generator must analyze all possible inputs to different portions of the program to establish their independence. Such automatic static parallelization works for programs which operate on regularly structured data, but has proven difficult for general programs. In addition, such static analysis cannot identify opportunities for parallelization that can be determined only at the time of execution when the data being read from or written to can be positively identified.
U.S. patent application Ser. No. 12/543,354 filed Aug. 18, 2009; U.S. patent application Ser. No. 12/858,907 filed Aug. 18, 2010; and U.S. patent application Ser. No. 12/882,892 filed Sep. 15, 2010 (henceforth the “Serialization” patents) all assigned to the same assignee as the present invention and all hereby incorporated by reference, describe systems for parallelizing programs, written using a sequential program model, during an execution of that program.
In these inventions, a master thread takes each computational operation and assigns it to a different processor queue according to a set of rules intended to prevent data access conflicts. By performing the parallelization during execution of the program, many additional opportunities for parallelization may be exploited beyond those which may be identified statically.