The invention relates to a method of executing a computer program with a processor that contains a configurable functional unit, capable of executing reconfigurable instructions, whose effect can be redefined at run-time. The invention also relates to a data processor for using such a method.
A method of executing a computer program with a processor that contains a configurable functional unit is known from an article titled xe2x80x9cDISC: The dynamic instruction set computerxe2x80x9d, by Michael J. Wirthlin and Brad L. Hutchings and published on pages 92 to 103 of the xe2x80x9cProceedings FPGAs for fast board development and reconfigurable computingxe2x80x9d (Proceedings SPIE 2607), 1995, edited by John Schewel.
This article describes a data processor with a functional unit that contains a field programmable gate array (FPGA). The FPGA is a circuit that produces output signals as a function of input signals. The FPGA consists of a matrix of rows and columns of configurable circuit elements. The relation between input and output can be configured by loading information into memory cells that control connections between different circuit elements of the FPGA and the functions of those circuit elements.
The use of a configuration program should be distinguished from a microprogram. As is well known, a microprogram defines individual control signals that are used to control functional circuits. Different control signals are defined for different stages of microcode execution and for different instructions. In contrast, relevant memory cells that store bits of a configuration program have a permanent control over the input output relation, that is, they control circuit elements permanently, irrespective of the instruction that is being executed or any execution stage. Usually the controlled input output relation is a time continuous circuit property.
Configuration programs are derived for effecting different configurable instructions. According to the article by Wirthlin et al., the FPGA matrix is divided into a number of bands of rows of circuit elements. Each configuration program takes up no more than one band and may be placed in any band. At run-time, when a certain configurable instruction is encountered, it is tested whether the configuration program for this instruction has already been loaded into any one of the bands. If so, the instruction is executed using the configuration program. If not, the configuration program for that instruction is loaded and then the instruction is executed using the configuration program.
Only a limited number of configuration programs can be loaded at the same time. If there is no room for loading a new configuration program, the configuration program for another configurable instruction is removed from the band to make room for the new configuration program.
Each time a configuration program is loaded there is a considerable overhead. According to the article, this overhead is minimized by keeping the configuration programs loaded as long as possible before removing them to load other configuration programs. Thus, a kind of caching of configuration programs is realized, which minimizes the overhead when a configurable instruction is used repeatedly. Still, there is considerable overhead for loading configuration programs.
Amongst others, it is an object of the invention to reduce the overhead needed to load configuration programs. It is a further object of the invention to increase the number of configuration programs that can be kept loaded together, so that configuration programs need to be loaded at fewer times. It is another object of the invention to reduce the amount of memory needed to store all configuration programs needed for a computer program.
An embodiment of a method of executing a computer program involves a defining, and loading of configurable instructions in combination and not individually. Before running the program one or more combinations are selected, each of at least two configurable instructions. Typically, each combination is associated with one or more sequential regions of instructions the computer program. When the particular region of the program is executed, the configuration program for all configurable instructions for the relevant combination for that region is loaded.
The combinations of instructions and their associated regions can be selected before running the program, in such a way that the overhead for loading configuration programs will be minimal, that is, configurable instructions selected for the combination occur in a sequence without interruption by other configurable instructions, not belonging to the combination, if these other configurable instructions would cause the need to load another combination. Thus, the work done to minimize the overhead is done at compile-time rather than at run-time.
In addition, for many computer programs the instruction cycle count can be minimized with a combination of instructions that have strong similarities, such as use of bits from the same positions in the operands or the computation of similar but slightly different logic functions. These instructions can be realized with hardware resources that are used in common by all instructions in the combination plus some hardware resources that are particular to individual instructions (or subsets of the instructions). Thus, the number of instructions that can be loaded in the combination is increased.
According to a further embodiment of the method according to the invention, the configuration program for the combination of instructions is selected so that it cross-minimizes reconfigurable hardware resource use of different instructions in the combination in the reconfigurable functional unit. Cross-minimization of resource use for several functions means that the resource use is not minimized for the functions independently, but that a minimum is sought in the design space of all configuration programs that perform all functions. As a result of cross-minimization between the different instructions in the combination, fewer hardware resources are needed than would be needed for the combination if resource use were minimized for each instruction independently.
Examples of hardware resources in a configurable functional unit are circuit elements and programmable connections. A typical configurable functional unit contains a number of identical circuit elements with connections that can be configured to be on or off and that connect the circuit elements to each other, to other circuit types of circuit elements or to inputs or outputs of the functional unit. Typically only a limited number of such connections can be configured: for example only some circuit elements can be connected directly to inputs or outputs or to a given other circuit element.
If configuration programs are selected for different instructions independently, each configuration program for an instruction in a combination has to leave hardware resources free for use by other configuration programs for other instructions in the combination, even though these hardware resources may not actually be used. By cross-minimization, a configuration program for one instruction can use any hardware that is not used for other instructions.
Even worse, selection of a circuit element for use in the configuration program for one instruction without regard to the other instructions in the combination may cause additional resource use when this selection fixes the connections to the input or output. This eliminates the possibility of minimizing hardware resource use by optimally selecting input/output connections other instructions.
By cross-minimizing the use of hardware resources of the different instructions in the same combination such waste of hardware resources can be avoided. Furthermore, it is made possible to share common hardware resources between different instructions. By cross-minimizing hardware use it is avoided that the common hardware resources have to be allocated more than once for the combination of instructions.
According to a further embodiment of the method according to the invention, hardware resource use for selecting different instructions in the combination and for processing operand data according to said different instructions is cross-minimized. Conventionally, instruction selection involves decoding the opcode into signals that enable operand data processing circuits. In the embodiment, fewer hardware resources are needed for instruction selection and operand data processing together than if hardware resource use of the configuration program for instruction and operand data processing were minimized independently of one another.
Preferably, the processor is pipelined. This means that instruction processing is split in successive stages, such as an instruction decoding plus operand fetch stage, an instruction execution stage and a result write back stage. In a pipelined processor different stages of instruction processing of successive instructions are executed in parallel with each other. The configurable part of instruction processing takes place at the execution stage. According to an embodiment of the invention both operand data processing and use of instruction selection bits to distinguish between different instructions takes place at the execution stage of processing the configurable instruction.
According to a further embodiment of the method according to the invention, the reconfigurable functional unit contains a reconfigurable cross-point switch between an input for operand data and connection lines that connect respective outputs of the cross-point switch to different logical combination circuits.