A compiler is a program containing multiple routines for translating source code into machine (object) code. In general, compilers take a high level source language (e.g., C, Fortran, etc.) and translate it into a sequential, intermediate format code. A dependency analysis is performed on the intermediate code statements. That analysis determines which operands are required to produce a given result and allows those operands to be available in the correct sequence and at the correct time during the processing operation. Subsequently, the compiler goes through a general optimizing routine which transforms the intermediate code statements into a subsidiary intermediate foam characterized by a more compact format. For instance, "dead" code is removed, common subexpressions are eliminated, and other compaction techniques are performed. These optimization actions are essentially open loop, in that the code is subjected to a procedure and is then passed on to a next optimization procedure without there being any intermediate testing to determine the effectiveness of the optimization. Subsequently, the optimized code statements are converted into machine language (object code). In general, such compiled code is directly run and is not subjected to a performance metric to determine the efficiency of the resulting object code.
In summary, compiler optimization procedures are basically open loop, in that they select individual statements in the intermediate code string and pass those statements through a list of optimization procedures. Once the procedures have been completed, the code is converted to object code and is not subjected to a further performance measure.
Recently, with the advent of highly parallel computers, compilation tasks have become more complex. Today, the compiler needs to assure both efficient storage of data in memory and for subsequent availability of that data from memory, on a nearly conflict-free basis, by the parallel processing hardware. Compilers must therefore address the fundamental problem of data-structure storage and retrieval from the memory subsystems with the same degree of care associated with identification and formation of vector/parallel code constructs.
Vector processors and systolic arrays are of little use if the data becomes enmeshed in traffic jams at both ends of the units. In order to achieve nearly conflict-free access, it is not sufficient to run intermediate code through an optimization procedure and "hope" that its performance characteristics have been improved. Furthermore, it is inefficient to fully compile/optimize a complex source code listing and then be required to compare the resulting object code's performance against performance metrics, before determining whether additional code transformations are required to achieve a desired performance level.
The prior art regarding compiler optimization is characterized by the following articles which have appeared over the years. Schneck et al. in "Fortran to Fortran Optimizing Compiler", The Computer Journal, Vol. 16, No. 4, pp. 322-330 (1972) describe an early optimizer directed at improving program performance at the source code level, rather than at the machine code level. In 1974, Kuck et al. in "Measurements of Parallelism in Ordinary Fortran Programs", Computer January 1974, pp. 37-46 describe some early efforts at extracting from one program, as many simultaneously executable operations as possible. The purpose of that action was to improve the performance of a Fortran program by enabling certain of its operations to run in parallel.
An optimization procedure for conversion of sequential microcode to parallel or horizontal microcode is described by Fisher in "Trace Scheduling: A Technique for Global Microcode Compaction", IEEE Transactions on Computers, Vol. C-30, No. 7, July 1981, pp. 478-490.
Heavily parallel multiprocessors and compilation techniques therefor are considered by Fisher, in "The VLIW Machine: A Multiprocessor for Compiling Scientific Code" Computer, July 1984, pp. 45-53 and by Gupta et al. in "Compilation Techniques for a Reconfigurable LIW Architecture", The Journal of Supercomputing, Vol. 3, pp. 271-304 (1989). Both Fisher and Gupta et al. treat the problems of optimization in highly parallel architectures wherein very long instruction words are employed. Gupta et al. describe compilation techniques such as region scheduling, generational code for reconfiguration of the system, and memory allocation techniques to achieve improved performance. In that regard, Lee et al. in "Mapping Nesting Loop Algorithms into Multidimensional Systolic Arrays", IEEE Transactions on Parallel and Distributed Systems, Vol. 1, No. 1, January 1990, pp. 64-76 describe how, as part of a compilation procedure, loop algorithms can be mapped onto systolic VLSI arrays.
The above-cited prior art describes open-loop optimization procedures. In specific, once the code is "optimized", it is converted into object code and then outputted for machine execution.
Accordingly, it is an object of this invention to provide an improved system for compiling source code, wherein optimization procedures are employed.
It is another object of this invention to provide an improved compiler wherein code transformed during an optimization procedure is immediately tested to determine if the conversion has improved its performance.
It is another object of this invention to provide a compiler that effectively enables allocation of data structures to one or more independent memory spaces (domain decomposition) to permit parallel computation with minimum subsequent memory conflicts.