The present invention relates to the translation of source code into a machine language for computer systems. More particularly, the present invention relates to a compiler technique that identifies parallel regions of a sequential program, thus resulting in a subsequently compiled machine language that allows parallel execution among a group of processors or a single multithread processor.
Improvements in microprocessor designs has lead to microprocessors with a high operating frequency. Current microprocessor designs exceed operating frequencies of 100 megahertz (xe2x80x9cMHzxe2x80x9d). However, the increase in operating frequency has not lead to excepted performance gains. One of the main components affecting performance gains is created by the microprocessor execution units idling during delays in external memory access. The delays in external memory access are typically caused by either the inductive losses associated with off chip transmissions or due to the conventional design characteristics of secondary memory devices coupled to the microprocessor.
To counteract the performance losses associated with external memory access conventional microprocessor designs developed cache systems. The cache systems store copies of external data internal to the microprocessor, thus avoiding the performance loss created by accessing external memory. One disadvantage of the conventional cache system is that modifications of data within the cache have to be reflected in external memory. Similarly, modifications of data in the external memory have to be reflected in the cache. Accordingly, the correlation of data between cache and external memory requires extraneous circuitry while still introducing the delay associated with external memory access.
Access to external memory can also be reduced by providing the microprocessor with a sequence of instructions that operate solely within a contained area of main memory. Accordingly, the frequency of cache updates would be reduced because the microprocessor operates on data that is self-contained within main memory. Furthermore, identifying independencies within the instruction sequence would allow parallel execution (by one or more microprocessors) on data stored in main memory. Thus, minimizing external memory access. Accordingly, parallel execution greatly increases the throughput of instructions while reducing the latency of main memory access. The sequencing of instructions for parallel execution is referred to as multithreaded organization.
To provide multithreaded organization some software applications include multithreaded coding or use compilers. Multithreaded coding describes explicit parallelisms introduced into the source code by the programmer. Typically, these explicit parallelisms require that the programmer carefully define the boundaries of loops, threads, and amount of loop iterations so that a compiled source code will result in a machine language that promotes parallel execution by an operating system and/or microprocessor(s). One disadvantage, of this approach is that the programmer must be aware of all the possible translations from source code to machine language. A second disadvantage, of this approach results from the standard format used in current source code. Source code typically follows a syntactic organization geared towards ease of use for a programmer. In contrast, multithreaded coding follows an organization scheme orientated towards the processor or operating system executing the complied source code.
Alternatively, compilers are used to generate machine language that promotes parallel execution by an operating system and/or microprocessor(s). Compilers attempt to increase throughput of instructions by finding or guessing the boundaries of instructions that can be executed in parallel. Control-flow and data-flow analysis is used to predict the boundaries of instructions that can be executed in parallel. Control-flow analysis reveals the flow of controls in programs. This includes loop structures, branches, and procedure calls. Data-flow analysis, on the other hand, discovers the properties, values, and inter-relations of program variables.
Using the predicted boundaries, the compiler can create machine language organized for parallel processing. This speculative compiling of source code, however, may result in a costly performance loss because miscalculated boundary requires reversal of assumed parallel execution paths and a sequential execution of all remaining paths. Further, this speculative compiling is not applicable to pointer based applications because in pointer based applications a group of pointer are possibly pointing to the same memory location, thus resulting in data corruption. Accordingly, during parallel execution of pointer based applications different process can overwrite the same memory location.
A computer-implemented method for recognizing parallelisms within a source code is described. The method includes locating a loop within a sequence of instructions. Additionally, the method includes finding an induction variable within the loop and identifying a field controlled by the induction variable. The method also includes determining that the field is set in every iteration of the loop.