1. Field of the Invention
The invention relates to the field of logic devices. More specifically, the invention relates to the field of programmable logic devices.
2. Background Information
One of the core functional units of a computer processor (or CPU) is the arithmetic/logic datapath, or simply, the datapath. The datapath is typically responsible for executing various arithmetic and/or logic operations supported by the instruction set architecture (ISA) of a computer system. As such, the datapath typically includes an arithmetic logic unit (ALU) that performs arithmetic/logic operations, an address generation unit to provide memory addresses, and a control unit to provide the proper control signals for the various devices of the datapath to perform the desired operation(s).
The control signals that control the operations of the datapath may be considered as a vector of bits, which is known as a "direct control vector", since it directly controls the datapath operations. The width of this direct control vector varies greatly in CPU designs, and both the overall width as well as the meaning of the individual control bits is dependent on detailed aspects of the design. However, for typical CPU designs, the width of the direct control vector is from about 50 to 150 bits. Typically, the direct control vector is developed from a combination of bits in the instruction, processor state bits (which are sometimes known as "mode bits"), and logic gates. The combination of instruction bits and mode bits, all of which may change on each cycle, can be considered as an "indirect control vector" since it indirectly controls the datapath operations. The indirect control vector is normally much less wide than the direct control vector, about 10 to 30 bits in a typical CPU design. For example, when an ADD instruction is issued in a CPU, an opcode (the indirect control vector) that is contained in the ADD instruction is decoded by the control mechanism to generate appropriate control signals (the direct control vector) to cause the ALU to add the two operands indicated by the ADD instruction. In a similar manner, other relatively simple arithmetic and/or (Boolean) logic operations may be realized by the datapath of the CPU.
Several aspects of a CPU's datapath may be limited by various device and/or design constraints. For example, operands in a CPU datapath are typically limited to those of fixed length to simplify the datapath and control mechanisms of the datapath, which in turn, may result in improved system performance/efficiency. Similarly, some CPU designs, such as those implemented in reduced instruction set architecture (RISC) processors, increase performance by limiting the complexity and number of types of operations supported by the datapath to minimize control signals, minimize/simplify the number of datapath components, etc.
A CPU's ISA cannot create more direct control vectors than 2.sup.X, where X is the width in bits of the indirect control vectors. This is because every possible direct control vector corresponds to a distinct indirect control vector, so even though there may be more bits in the direct control vector, the number of states reachable by the datapath is determined by the indirect control vector. For this reason, a CPU design cannot specify in a single instruction all the complex logic operations that may be necessary for some applications. Instead, complex logic operations are broken down into a sequence of simpler ones. In this way, a CPU may perform an arbitrarily complex logic operation, but it may take many instruction cycles to complete.
Some applications require relatively complex logic operations to be performed at high speed. For example, an application might require a certain complex logic operation to be performed 1 million times per second. For a CPU to perform these operations in time, it must be able to process instructions at a still higher rate. For example, if an operation required 800 instructions on a certain CPU, it would have to process 800 million instructions per second to meet the requirements of the application. In many cases, this is not an economical way to implement demanding applications, while in others it is not possible at all. In such cases, other devices may be used in place of or in combination with a CPU's ALU. For example, programmable logic arrays (PLAs), field programmable gate arrays (FPGAs), and application specific integrated circuits (ASICs) may be tightly coupled to serve as coprocessors to a CPU. The coprocessor elements, whether ASICs, PLAs, or FPGAs, are configured to perform the complex logic operations required by the application in a much more parallel manner than a CPU, so that the operations can be done at a lower, and more economical, clock rate.
While ASICs are specifically designed state machines and datapaths, PLAs and FPGAs typically contain an array/matrix of logic circuits (e.g., logic gates, memory cells, etc.) in which connections between particular logic circuits may be programmed after manufacture (e.g., by a user in the field; hence, the term "field" programmable). As such, PLAs and FPGAs may be configured to perform relatively complex logic operations by making the proper pattern of interconnections (e.g., by burning in fuses or programming individual SRAM cells) in the logic array of such devices. Often, this is analogous to defining a single, highly specialized CPU instruction specifically for the application, or in more complex cases a better analogy might be to defining a highly specialized datapath that implements several specialized instructions using its own direct and indirect control vectors, which may be supplied by the CPU.
However, PLAs, FPGAs and ASICs suffer from some limitations. For example, ASICs cannot be reprogrammed. As another example, certain PLAs and FPGAs cannot be reprogrammed once configured and installed (often referred to as "one-time programmable"). Thus, such devices may not be suitable for applications wherein the execution of various logic operations may be required. Furthermore, a substantial portion of circuitry in PLAs and FPGAs may be unused, resulting in power and/or cost inefficiency.
Although some FPGAs may be re-programmed to support various logic operations and numbers of inputs, such devices also suffer from limitations. For example, in an SRAM cell-based FPGA, the interconnection array in which the various configurable logic blocks (CLBs) reside is typically programmed by pass transistors, which may result in relatively large "on" resistance. Furthermore, interconnect delays in SRAM cell-based FPGAs may be relatively large due to certain wires of unpredictably varying, and sometimes relatively long, length. Yet further inefficiency may be caused by the presence of multiple wires in the interconnect array which may be unused, resulting in increased capacitive load and increased device driver power requirements; and by the need for multiple pass transistors and SRAM cells to complete each logical connection. Finally, the number of control/configuration bits typically required to program an FPGA (e.g., produce the appropriate interconnections between the CLBs) may exceed 250,000 bits, making dynamic (e.g., "on the fly"; on a cycle-by-cycle basis) re-configuration/re-programming relatively difficult and commercially impractical.