Distributing the compilation, simulation and execution of computer programs and hardware models among two or more processing nodes has two primary advantages: increased program/model capacity and decreased simulation/execution time. The size and complexity of program/model which can be compiled and simulated/executed increases due to the additional memory as well as processing resources available. Simulation/execution time decreases due to the opportunity for accesses for optimization to the partially compiled intermediate representing the program/model as well as concurrent compilation, simulation and execution by multiple processing nodes.
A processing node consist of one or more general-purpose processors sharing a common memory. Optional components of a processing node include processor-specific memory, various levels of memory caching specific to a single processor or shared among two or more processors, and re-configurable logic specific to a single processor or common to two or more processors. Processing nodes may support one or more distinct virtual address spaces mapped onto physical memory devices through conventional address translation hardware and software. Processing nodes may be considered as shared memory multiprocessors to which re-configurable logic arrays have been added.
Processing nodes (and shared memory multiprocessors) are readily constructed in configurations containing up to approximately a dozen processors, however as additional processors are added with connection to a common shared memory, the efficiency of each processor degrades due to contention for the common shared memory. Therefore larger and more powerful computing systems are often created by connecting two or more such processing nodes using point-to-point or multi-cast message protocols. Point-to-point message protocols communicate a unit of information (message) from an agent on one processing node to an agent on the same processing node or another processing node. Multi-cast message protocols communicate from an agent on one processing node to one or more agents on the same or other processing nodes. Agent functionality is embodied either as software running on processors or hardware embedded in or associated with re-configurable logic arrays. Such agents embody components of compilation, simulation or execution.
Compilation, simulation and execution are productively viewed as tightly inter-related modes of operation embodied in processor executables (manifest in caches and memory) and logic configuration (manifest in re-configurable logic elements). Compilation translates one or more computer programs and/or hardware models into processor executables and logic configuration information. The behavior represented by the executables and logic configuration may then be evaluated as simulation and/or execution. In general use, simulation often refers to the evaluation of hardware models whereas execution often refers to the evaluate of a computer program. With the increasing use of hardware description languages (such as VHDL and Verilog) as well as hardware/software co-design, simulation and execution have become almost indistinguishable operating modes and are treated as such in the following.
In order to accommodate incremental modes of operation on programs and models, such as symbolic debug, profiling, fault insertion, selective event tracing, dynamic linking of libraries, incremental optimization of executables (based on available resources or new information) and programming interfaces, which call for the incremental modification of the program/model under execution/simulation, it is useful for the compilation and execution/simulation modes to be tightly coupled. Such tight coupling reduces simulation/execution time given fixed execution resources.
Compilation is typically arranged in a unidirectional pipeline using two or more intermediate files (actual or simulated in memory via pipes) before reaching the execution/simulation operating mode. Common intermediate files include intermediate optimization representations, textual assembly code, re-locatable binaries and executable files. Many simulators even introduce a programming language intermediate when compilation of a hardware model translates into a program which is then compiled by a programming-language specific compiler. Some optimizing compilers utilize as many as a dozen file intermediates.
Using apparatus such as files to communicate uni-directionally between phases of the compilation inhibits the rapid and efficient flow of information backward from later stages to earlier stages of the compilation operating mode. For example, back-end compiler functionality positioning executable processor instructions in shared memory or logic functionality within reconfigurable logic arrays can detect false-sharing or re-configurable logic pin contention which is most efficiently addressed by partial re-execution of earlier compilation functionality (mapping and scheduling in this case) to produce a more optimal simulation/execution load.
Files are also a very coarse communication mechanism between stages of compilation. Substantial information is generally present in a file intermediate which is irrelevant to a localized change to the simulation. Thus compilation or recompilation must handle substantial more information than is required for the desired operation. Such additional work consumes time, lengthening the time required to reach the execution/simulation stage.
In the few cases from the research literature when the compilation operating mode retains the entire intermediate in memory, rather than in a sequence of intermediate files, it has been in the memory of a single processor. Whereas global access to the entire intermediate throughout compiler operation has demonstrated substantial execution/simulation performance gains, any single processor generally has limited range of addressable as well as physically present memory. Thus such approaches limit the ease with which new agents may be introduced to alter compiler operation or target new simulation/execution apparatus and the size program or model which may be compiled on a single processor.
Within the existing compiler literature and production compiler environment, either compilation is run in parallel using shared memory multiprocessors to accelerate a single phase of compilation or source files are independently compiled into an associated executable followed by a sequential linkage of binaries into a single executable. Compilation via acceleration of a single compilation phase on a shared memory multiprocessor is well suited for research purposes, but is not directly applicable to decreasing the entire compilation or incremental recompilation delay. Compilation of each file comprising a multi-file program or model in isolation does not allow for the flow of information between files to yield a more optimal executable. For example, the body of a function present in one file is not available for incorporation at the call site in another file (often known as in-lining) unless the body is textually included as part of the second file's compilation. As more information is textually included into a single file, the file size increases, eventually limiting the total program or model size which can be compiled the total amount of work required for compilation (since the same information is analyzed more than once during compilation).
In 1990, the research was published describing the representation of an analyzed hardware description language model using intermediate representation instances of abstract data types (classes). Memory addresses (pointers) describe the relationship between instances. For example, a sequence of intermediate representation instances may each have a pointer to the next, forming a linked list. This work did not address the partitioning of an intermediate representation across more than one node (virtual address space), nor did it integrate more than the representation of the compiler's analysis phase.
In 1991, further research was published research exploring the feasibility of compiling, simulating and executing hardware models using shared memory or message-based parallel processors with a parallel intermediate representation. This publication suggested the distribution of an intermediate compiler representation by replacing each pointer in the intermediate representation of the analyzed form with a tuple (record) consisting of a field denoting the node and a field denoting the intermediate representation address on the specified node. This work also explored the complexities and possible approaches for incremental compilation.
A 1993 publication reported on an evolution of the 1991 work in a description of a distributed, post-analysis intermediate representation without further implementation detail and a post-elaboration and post-optimization (in-lining) redistribution of processes within the intermediate compilation. This work did not discuss a single, compiler-oriented database spanning multiple compilation phases, simulation or execution and did not discuss the parallel database representation.
In summary, an apparatus with compiler and simulation/execution operating modes is desirable which efficiently provides global access to specific information required for compilation as well as simulation/execution among the processors, memory and optional re-configurable logic of one or more processing nodes as such nodes become available for use. Such an apparatus and operating modes would provide for compilation and simulation/execution of larger designs than can be accommodated by compilation on a single node while providing opportunities for global optimization and incremental recompilation which reduce the time required to compile as well as simulate/execute.
Further work in October of 1996 disclosed a distributed, compiler-oriented database with clients including:
source analyzers (compiler component) PA1 elaborator (compiler component) PA1 optimizer (compiler component) PA1 code generator (compiler component) PA1 assembler (compiler component) PA1 linker (compiler component) PA1 runtime system (simulation/execution component) PA1 debugger (simulation/execution component) PA1 profilers (simulation/execution component) PA1 event log (simulation/execution component) and PA1 graphical tools (components of various phases).
The work introduced the concept of a single, compiler-oriented database spanning compilation and simulation/execution on a computer with multiple nodes.