1. Field of the Invention
This invention relates generally to reconfigurable computer architectures for reconfigurable computing using Programmable Logic Devices.
2. Description of the Related Art
A programmable logic device or PLD is a programmable integrated circuit that allows the user of the circuit, using software control, to customize the logic functions the circuit will perform. The logic functions previously performed by small, medium, and large scale integration integrated circuits can instead be performed by programmable logic devices. When a typical programmable logic device is supplied by an integrated circuit manufacturer, it is not yet capable of performing any specific function. The user, in conjunction with software supplied by the manufacturer or created by the user or an affiliated source, can program the PLD to perform the specific function or functions required by the user's application. The PLD then can function in a larger system designed by the user just as though dedicated logic chips were employed. For the purpose of this description, it is to be understood that a programmable logic device refers to once programmable as well as reprogrammable devices.
Current state of the art computers are fixed hardware systems based upon microprocessors. As powerful as the microprocessor is, it must handle far more functions than just the application it is executing. With each new generation of microprocessors, the application's performance increases only incrementally. In many cases the application must be rewritten to achieve this incremental performance enhancement.
Currently, the trend in microprocessor design is to increase the parallelism of execution in order to boost performance. Current generation microprocessors have multiple special function units all operating in parallel on a single chip. These microprocessors are able to exploit the inherent parallelism in existing programs by executing several instructions during each clock cycle. The limitation in the number of concurrent instructions a microprocessor is capable of executing is not hardware related, as microprocessor designers may place many levels of parallelism upon a given die. Instead, the limitation may be the number of instructions in the software program that can be executed in parallel. Even today's software algorithms run into performance bottlenecks due to branch instructions or data dependencies, which result in a flushing of the multiple execution units.
As an example, to further improve the performance of applications designers have resorted to building hardware accelerators for specific applications. Graphics accelerations is an example of this approach. Typically, a graphic command includes a series of lower level commands, which require many cycles to implement. The resulting performance bottleneck can be avoided by use of additional special purpose hardware. For example, display accelerators generally intercept display requests from the operating system that would normally be executed by the CPU and instead executes them directly in hardware. This is much faster than having the CPU itself execute the corresponding instructions for the display command.
Further enhancements to computing performance could be attained with a system offering dynamic reconfiguration such that several applications could be accelerated with the same hardware system. This is the foundation of reconfigurable computer architectures.
Reconfigurable computing systems are those computing platforms whose architecture can be modified by the software to suit the application at hand. To obtain maximum through-put, an algorithm must be placed in hardware (i.e., an ASIC, DSP, etc.). Dramatic performance gains are obtained through the "hardwiring" of the algorithm. In a reconfigurable computing system, this "hardwiring" takes place on a function by function basis as the application executes.
FIG. 1A is an illustration of a prior art routing structure for a reconfigurable computing system architecture known to those skilled in the art as a hypercube. The routing structure illustrated is exemplified by a universal circuit board developed by the Altera Corporation of San Jose, Calif. known as "RIPP10".TM.. In the illustrated embodiment, there are eight (8) user configurable PLDs 101-108 located at each vertex of hypercube 100, four (4) local memory devices 110-114 located on four edges of hypercube 100, and a global bus 115 originating at the center of hypercube 100. Global bus 115 electrically interconnects all eight user configurable PLDs thereby linking them to an external host computer (not shown). In the example shown, each one of the eight user configurable PLDs are electrically connected to each of its 3 nearest neighbors user configurable PLDs as well as to a fourth user configurable PLD located at the opposite vertex of hypercube 100. For example PLD 101 is connected to its nearest neighbors PLD 102, PLD 104, and PLD 108 as well as PLD 106.
FIG. 1B is a board level schematic representation of the physical interconnects of the "RIPP10".TM. universal circuit board. As shown, the array of programmable logic devices and associated local memory communicates with an external host computer via a single global bus 115. Unfortunately, the use of single global bus 115 in this manner substantially precludes the user from simultaneously executing an algorithm in a portion of the array of programmable logic devices while concurrently and independently reconfiguring a different portion of the array. Rather, the user may only reconfigure the entire array in order to implement a single application at a time.
FIG. 1C is a board level schematic representation of local memory hierarchy of the "RIPP10".TM. universal circuit board as represented in FIG. 1B. A local group 160 is formed by nearest neighbors PLD 107 and PLD 108 and a shared memory device 114 electrically interconnected by a local bus 162. By way of example, in the RIPP10.TM. universal circuit board, local memory device 114 takes the form of a commercially available 256K.times.8 SRAM device and local bus 162 takes the form of a bus structured as a separate address bus and bi-directional data bus totaling 47 active bits.
All local memory is shared with at least one other local PLD and possibly other non-local PLDs which are requesting use of local memory. In theory, any PLD may access any non-local memory device on the board, however, any non-local memory access request will disadvantageously demand additional system processing requirements such as querying permission to use non-local memory, conflict arbitration, and restrictions due to address bandwidth limitations resulting in additional cycle time. Unfortunately, additional system overhead related to conflict arbitration would also have to be implemented.
In view of the foregoing, there is a need for an improved reconfigurable computing system architecture utilizing user configurable PLDs offering dynamic independent partial reconfiguration, advantageous logic to memory ratio, and ease of design.