1. Field of the Invention
This invention generally relates to integrated circuit computing devices, and, more specifically, relates to an integrated circuit computing device comprising a dynamically configurable gate array which has a microprocessor coupled to a reconfigurable instruction execution unit. This device can implement complex, time-consuming operations by reconfiguring the instruction execution unit to perform a specific function very quickly in hardware rather than implementing complex operations in time-consuming software routines.
2. Description of the Related Art
Most modern computers are based on a conventional Von Neumann architecture which executes software instructions in sequential fashion. Many modern computers are based on the microprocessor, which follows the traditional, sequential Von Neumann approach. In recent years the use of the microprocessor has become more widespread and varied, from special purpose microprocessors with special features suited to automotive and control applications (commonly known as microcontrollers) to the more highly-integrated general purpose microprocessors such as the Intel 80386 and 80486, which are used in IBM-compatible personal computers, and the Motorola 68020 and 68030, which are used in Apple McIntosh-compatible personal computers.
As the microprocessor matured from its infancy, its capabilities were increased by adding more circuitry to handle more complex functions. Many complex functions were added by implementing complex instructions in a sequence of low-level instructions within the microprocessor known as firmware. In this manner a MULTIPLY instruction within a typical microprocessor causes the microprocessor to generate a sequence of ADD and SHIFT instructions to accomplish the desired MULTIPLY function. If this MULTIPLY function could be carried out in hardware, the execution time for the MULTIPLY function could be reduced by orders of magnitude.
The microprocessor evolved over many years to become a very complex and powerful general purpose processor, capable of high levels of performance due to the large amount of circuitry and firmware dedicated to complex, high level functions. These high power, complex, general purpose microprocessors are known as Complex Instruction Set Computers (CISC), due to the features that would allow the execution of complex instructions.
In the early 1980s a new philosophy began to gain acceptance in the microprocessor field. This approach stripped the special purpose, complex circuitry and firmware out of the microprocessor and implemented instead a Reduced Instruction Set Computer (RISC). The RISC architecture concentrated on implementing each instruction within a simple instruction set in a single clock cycle. The underlying philosophy of the RISC architecture is to do fewer functions than the CISC architecture, but to do them very fast. As a result of the reduced, simplified instruction set, the amount of circuitry in a RISC is substantially less than that used in a CISC. So for a typical RISC machine, there is no MULTIPLY instruction. The MULTIPLY operation would be accomplished in a RISC machine by a software routing performing a series of ADD and SHIFT instructions. In many applications a RISC-based computer can outperform a CISC-based computer even though it must implement many of the CISC functions in software routines. This is due to the highly efficient instruction set where each instruction can be executed much faster than even the simplest instructions in a CISC-based computer. This improvement in speed usually more than makes up for the overhead in additional software.
Certain applications such as digital signal processing, video image generation, and complex mathematical calculations require functions that are not implemented within the complex hardware and firmware of the general purpose CISC. Some microprocessors have circuitry dedicated to perform certain of these complex functions in hardware, such as digital signal processors, video processors, or math processors. However, each of these is limited to its specific realm, is not suited to general-purpose use, and cannot be modified to perform a different type of high level function. For a general purpose CISC or RISC to perform these types of special, complex functions, they must be implemented in long, complex software routines that take a relatively long time to execute. A computer system that uses a CISC or RISC type microprocessor to perform these complex operations will spend a relatively large amount of time executing these complex operations when compared to the time spent performing other simpler functions.
A well-known rule with regards to problem solving is known as the Amdahl Rule, which states that 10% of the problem generally takes 90% of the time to solve the problem. This rule also applies to computers: 10% of the computer's operations generally take 90% of the computer's time. Assuming this is true, it is obvious that an improvement in the execution time of the 10% of the computer's functions that take 90% of the computer's time will directly and drastically improve the performance of the computer.
This bottleneck in computer speed could be lessened or eliminated by providing a microprocessor which could execute most of these time-consuming functions in hardware. Indeed, this is the precise approach used with special purpose microprocessors that suits them so well to their specific intended tasks. However, it is impossible from a practical standpoint to make a microprocessor with all conceivable high-level functions implemented in hardware and/or firmware. Constraints on semiconductor die size and system architecture make the building of a general purpose microprocessor which directly provides a large variety of high-level, complex functions impossible at this point in time.
Programmable logic devices are well-known in the electronics art, and have progressed from simple AND-OR arrays to very complex Field Programmable Gate Arrays (FPGAs), which have a large number of input/output (I/O) blocks, programmable logic blocks and programmable routing resources to interconnect the logic blocks to each other and to the I/O blocks. Many uses for these FPGAs have been found, with most being used to implement a high number of combinatorial logic functions, which results in lower part count, lower power dissipation, higher speed and greater system flexibility than if discrete components were used. Some FPGAs have been used to implement sequencers and other various forms of state machines which are essentially combinatorial in nature. Thus, the vast majority of the applications for the typical FPGA are for combinatorial logic functions.
In recent years FPGAs based on Random Access Memory (RAM) were introduced by several manufacturers, including XILINX. The basic configuration of the XILINX FPGA is described in U.S. Pat. No. 4,870,302 to Freeman, which is assigned to XILINX, and is incorporated herein by reference. In addition, the technical features of XILINX FPGAs are described in The Programmable Gate Array Data Book, (XILINX 1992). The XILINX RAM-based FPGA has multiple I/O blocks, logic blocks and routing resources. The routing resources are used to interconnect the logic blocks to each other and to the I/O blocks, and to connect the I/O blocks through the I/O pads to the pins of the FPGA. The programming of the FPGA is accomplished by loading configuration data into the Configuration Memory Array of the FPGA. Since the XILINX FPGA is RAM-based, when power is first applied to the FPGA it has not yet been configured. Once the configuration data has been loaded into the Configuration Memory Array, the FPGA is ready for operation.
Dynamic reprogramming of the XILINX FPGA is not a novel concept since XILINX specifically acknowledges this potential use for the FPGA. Yet in most applications known in the prior art, the FPGA is reconfigured only to provide a different combinatorial logic function, and has not been used to implement a general purpose computing device. If a general computing device could be constructed within an FPGA, greater system flexibility would be achieved.
The Supercomputing Research Center in Bowie, Maryland, has succeeded in implementing a computing device within a XILINX RAM-based FPGA. Two computers have been built with this architecture, the SPLASH 1 which is discussed in Maya Gokhale et al., Building and Using a Highly Parallel Programmable Logic Array (Supercomputing Research Center, Jan. 1991) and the SPLASH 2 which is discussed in Jeffrey M. Arnold et al., SPLASH 2 (Supercomputing Research Center, 1992). To achieve high-speed operation the XILINX FPGAs are placed in a systolic array which distributes the computing among the FPGAs to accomplish a high level of parallel processing. This systolic array configuration results in greatly increased computing speed due to the shared parallel execution of functions, but requires the use of many XILINX FPGAs and a great deal of software overhead to distribute the processing to accomplish this high level of performance.
In summary, general-purpose CISC and RISC machines are not well-suited to fast execution of complex operations. Special-purpose processors execute a limited number of complex operations very quickly, but cannot be configured for operations outside their limited specialty, and are not well-suited as general purpose computing devices. Although some of these limitations are addressed by the SPLASH 1 and SPLASH 2 computers, these are very complex and expensive parallel processing computers that require many FPGAs arranged in a systolic array.
Therefore, there existed a need to provide an integrated circuit computing device which is implemented in a single FPGA which can effectively execute the most complex, time-consuming functions in hardware by dynamically reconfiguring the FPGA so the instruction execution unit is modified to execute the desired operation in hardware. Implementing these time-consuming operations in hardware results in a substantial increase in speed of the computing device when compared to conventional approaches.