1. Field of the Invention
The present invention relates generally to software for reconfigurable computers, and more particularly to a compiling system and method for generating executable files for use in a dynamically reconfigurable processing unit having changeable internal hardware organization.
2. Description of Background Art
Related application Ser. No. 08/423,560, entitled "System and Method for Scalable, Parallel, Dynamically Reconfigurable Computing," describes a software-programmed reconfigurable computing architecture employing field-programmable gate arrays (FPGAs). The architecture is scalable, flexible and reconfigurable. A scalable parallel interconnection capability is implemented as a built-in architectural primitive. Thus, the described implementation machine can include any number of processors.
In the prior art, attempts have been made to provide reconfigurable machines. A first such prior art approach is that of downloadable microcode machines, wherein the behavior of fixed, nonreconfigurable hardware execution resources can be selectively altered by using a particular version of microcode loaded into a programmable control store. See, for example, J. L. Hennessy and D. A. Paterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, 1990. In some such systems, microcode can be written or altered by the user after manufacture. See, for example, W. T. Wilner, "Design of the Burroughs B1700," in AFIPS Fall Joint Computer Conference, AFIPS Press, 1972; W. G. Matheson, "User Microprogrammability in the HP-21MX Minicomputer," in Proceedings of the Seventh Annual Microprogramming Workshop, IEEE Computer Society Press, 1974. Because the fundamental computational hardware in such prior art systems is not itself reconfigurable, such systems do not provide optimized computational performance when considering a wide range of problem types. Specifically, such systems are generally unable to alter the data path, are limited by the size of the execution units, and are only able to provide alternate instruction sets for the same hardware. Such systems do not provide a single compiler that is capable of compiling for two different architectures.
A second prior art approach involves a system in which the hardware which performs a computation is implemented using programmable logic. Examples exist that use off-the-shelf FPGAs (PAM, SPLASH, VCC) and custom programmable logic (TERAMAC). See, for example: P. Bertin et al., Programmable Active Memories: A Performance Assessment, Tech. Rep. 24, Digital Paris Research Laboratory, March 1993; D. A. Buell et al., Splash 2: FPGAs in a Custom Computing Machine, IEEE Computer Society Press, 1996; S. Casselman, "Virtual Computing and The Virtual Computer," in IEEE Symposium on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1994; R. Amerson et al., "Teramac-Configurable Custom Computing," in IEEE Symposium on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1995. In general, these technologies require that an application be specified in terms of a hardware description, expressed either as a schematic or using a hardware description language such as VHDL, rather than by writing software for a computer defined by FPGAs. For example, PAM is programmed by writing a C++ program that generates a netlist describing gate configuration and architecture. An application developer specifies a data structure representing a hardware description for implementing the application, rather than compiling a specification of an application algorithm. SPLASH is programmed in one of three ways: 1) a schematic capture package for building a hardware specification-based on a schematic diagram; 2) hardware description language (such as VHDL) coupled with a synthesis package which translates the VHDL into gate primitives; or 3) DBC, a C-language subset that is compiled into gate descriptions. TERAMAC is programmed using a schematic capture package or hardware description language. None of these programming techniques describes algorithmic steps; rather they provide mechanisms for specifying hardware architectures.
A third prior art approach involves reconfigurable computers which do execute software. The RISC 4005 and Hokie processor implement standard microprocessors within FPGAs. The RISC 4005 is essentially a demonstration of embedding a central processing unit (CPU) within a small portion of an FPGA whose other resources are dedicated to some coprocessor function. Hokie is used as an educational exercise in computer engineering. An ISA is selected before compilation and execution, and that ISA is used throughout. In addition, the bitstream for the processor is stored separately from the software which it executes. Ad hoc methods are used to ensure that a correct bitstream is loaded. See, for example, P. Athanas and R. Hudson, "Using Rapid Prototyping to Teach the Design of Complete Computing Solutions," in IEEE Symposium on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1996. These systems do not provide for reconfiguration at run-time (during execution).
Another prior art reconfigurable computer is the Dynamic Instruction Set Computer (DISC), which employs a reconfigurable processing unit. See, for example, M. J. Wirthlin and B. L. Hutchings, "A Dynamic Instruction Set Computer," in IEEE Symposium on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1995; D. A. Clark and B. L. Hutchings, "The DISC Programming Environment," in IEEE Symposium on FPGAs for Custom Computing Machines, IEEE Computer Society Press, 1996. The execution and configuration of the DISC processing unit FPGA is controlled by a microcontroller, also implemented in an FPGA. The microcontroller is programmed from a dialect of the C programming language. The compiler for this C dialect recognizes that certain program statements are to be executed by corresponding hardware configurations of the processing unit, and emits microcontroller code that causes the correct configuration bitstream to be loaded into the processing unit during execution. One skilled in the art will recognize that the microcontroller itself has a fixed instruction set, and that the compiler compiles to this fixed instruction set. There are several disadvantages to the architecture used by DISC. Since the microcontroller is fixed, it cannot be optimized for controlling different types of processing units. The configuration bitstreams are stored in external hardware outside of the memory space of the microcontroller, and thus the system is not self-contained. Additionally, the above-referenced documents do not disclose how DISC could be used for parallel computation, global signaling and clocking, or interrupt handling. Finally, new instructions are specified as atomic entities. The compiler only emits instructions for one instruction set, but allows individual instructions to be added by the programmer. Each processing unit configuration is a single hard-coded instruction provided by the programmer, thus reducing potential flexibility.
A fourth prior art approach involves mixed systems, wherein different parts of the algorithm are mapped to different components of the system. One prior art system maps an algorithm expressed in an extended C dialect to a mixed FPGA/DSP architecture. The user explicitly marks sections of the input program for targeting to the DSP, while the rest of the code is compiled into gates for FPGA implementation. Such systems require specialized tools, since they employ a non-standard syntax for ISA changes. In addition, operation of such systems is cumbersome due to the use of netlists for FPGA specification of portions of the program. Such systems do not provide actual hardware reconfiguration, but merely provide capability for mapping to another piece of hardware.
Similarly, some systems employ a standard microprocessor with some configurable logic resources. These resources are used to implement special instructions which speed execution of particular programs. See, for example R. Razdan and M. D. Smith, "A High-Performance Microarchitecture with Hardware-Programmable Functional Units," in Proceedings of the Twenty-Seventh Annual Microprogramming Workshop, IEEE Computer Society Press, 1994. Such systems are typically implemented as a central processing unit (CPU) with a portion of the silicon die used to implement an FPGA. The CPU has a fixed data path to which the FPGAs are connected. The compiler combines selected assembly code sequences into single-instruction statements for execution by an FPGA. However, such systems generally operate only on existing assembly language code, and require an adjacent fixed ISA as a starting point. In addition, such systems do not generally provide run-time reconfiguration. Finally, such systems are not broadly applicable and typically do not provide a significant speed improvement over other conventional systems.
Though the above-mentioned systems each provide some level of reconfigurability of hardware, none of them describes a method or apparatus for encapsulating binary machine instructions and data along with the hardware configurations required to execute the machine instructions in the manner claimed herein. In addition, none of the prior art systems discloses either multiple-architecture ISA reconfiguration on a level of granularity comparable to RISC or CISC instructions as claimed herein, or compilation methods within a C-language syntax for execution on dynamically reconfigured ISAs as claimed herein.