1. Field of the Invention
The invention relates to the field of logic devices. More specifically, the invention relates to the field of programmable logic devices.
2. Background Information
One of the core functional units of a computer processor (or CPU) is the arithmetic/logic datapath, or simply, the datapath. The datapath is typically responsible for executing various arithmetic and/or logic operations supported by the instruction set architecture (ISA) of a computer system. As such, the datapath typically includes an arithmetic logic unit (ALU) that performs arithmetic/logic operations, an address generation unit to provide memory addresses, and a control unit to provide the proper control signals for the various devices of the datapath to perform the desired operation(s).
The control signals that control the operations of the datapath may be considered as a vector of bits, which is known as a xe2x80x9cdirect control vectorxe2x80x9d, since it directly controls the datapath operations. The width of this direct control vector varies greatly in CPU designs, and both the overall width as well as the meaning of the individual control bits is dependent on detailed aspects of the design. However, for typical CPU designs, the width of the direct control vector is from about 50 to 150 bits. Typically, the direct control vector is developed from a combination of bits in the instruction, processor state bits (which are sometimes known as xe2x80x9cmode bitsxe2x80x9d), and logic gates. The combination of instruction bits and mode bits, all of which may change on each cycle, can be considered as an xe2x80x9cindirect control vectorxe2x80x9d since it indirectly controls the datapath operations. The indirect control vector is normally much less wide than the direct control vector, about 10 to 30 bits in a typical CPU design. For example, when an ADD instruction is issued in a CPU, an opcode (the indirect control vector) that is contained in the ADD instruction is decoded by the control mechanism to generate appropriate control signals (the direct control vector) to cause the ALU to add the two operands indicated by the ADD instruction. In a similar manner, other relatively simple arithmetic and/or (Boolean) logic operations may be realized by the datapath of the CPU.
Several aspects of a CPU""s datapath may be limited by various device and/or design constraints. For example, operands in a CPU datapath are typically limited to those of fixed length to simplify the datapath and control mechanisms of the datapath, which in turn, may result in improved system performance/efficiency. Similarly, some CPU designs, such as those implemented in reduced instruction set architecture (RISC) processors, increase performance by limiting the complexity and number of types of operations supported by the datapath to minimize control signals, minimize/simplify the number of datapath components, etc.
A CPU""s ISA cannot create more direct control vectors than 2X, where X is the width in bits of the indirect control vectors. This is because every possible direct control vector corresponds to a distinct indirect control vector, so even though there may be more bits in the direct control vector, the number of states reachable by the datapath is determined by the indirect control vector. For this reason, a CPU design cannot specify in a single instruction all the complex logic operations that may be necessary for some applications. Instead, complex logic operations are broken down into a sequence of simpler ones. In this way, a CPU may perform an arbitrarily complex logic operation, but it may take many instruction cycles to complete.
Some applications require relatively complex logic operations to be performed at high speed. For example, an application might require a certain complex logic operation to be performed 1 million times per second. For a CPU to perform these operations in time, it must be able to process instructions at a still higher rate. For example, if an operation required 800 instructions on a certain CPU, it would have to process 800 million instructions per second to meet the requirements of the application. In many cases, this is not an economical way to implement demanding applications, while in others it is not possible at all. In such cases, other devices may be used in place of or in combination with a CPU""s ALU. For example, programmable logic arrays (PLAs), field programmable gate arrays (FPGAs), and application specific integrated circuits (ASICs) may be tightly coupled to serve as coprocessors to a CPU. The coprocessor elements, whether ASICs, PLAs, or FPGAs, are configured to perform the complex logic operations required by the application in a much more parallel manner than a CPU, so that the operations can be done at a lower, and more economical, clock rate.
While ASICs are specifically designed state machines and datapaths, PLAs and FPGAs typically contain an array/matrix of logic circuits (e.g., logic gates, memory cells, etc.) in which connections between particular logic circuits may be programmed after manufacture (e.g., by a user in the field; hence, the term xe2x80x9cfieldxe2x80x9dprogrammable). As such, PLAs and FPGAs may be configured to perform relatively complex logic operations by making the proper pattern of interconnections (e.g., by burning in fuses or programming individual SRAM cells) in the logic array of such devices. Often, this is analogous to defining a single, highly specialized CPU instruction specifically for the application, or in more complex cases a better analogy might be to defining a highly specialized datapath that implements several specialized instructions using its own direct and indirect control vectors, which may be supplied by the CPU.
However, PLAs, FPGAs and ASICs suffer from some limitations. For example, ASICs cannot be reprogrammed. As another example, certain PLAs and FPGAs cannot be reprogrammed once configured and installed (often referred to as xe2x80x9cone-time programmablexe2x80x9d). Thus, such devices may not be suitable for applications wherein the execution of various logic operations may be required. Furthermore, a substantial portion of circuitry in PLAs and FPGAs may be unused, resulting in power and/or cost inefficiency.
Although some FPGAs may be re-programmed to support various logic operations and numbers of inputs, such devices also suffer from limitations. For example, in an SRAM cell-based FPGA, the interconnection array in which the various configurable logic blocks (CLBs) reside is typically programmed by pass transistors, which may result in relatively large xe2x80x9conxe2x80x9d resistance. Furthermore, interconnect delays in SRAM cell-based FPGAs may be relatively large due to certain wires of unpredictably varying, and sometimes relatively long, length. Yet further inefficiency may be caused by the presence of multiple wires in the interconnect array which may be unused, resulting in increased capacitive load and increased device driver power requirements; and by the need for multiple pass transistors and SRAM cells to complete each logical connection. Finally, the number of control/configuration bits typically required to program an FPGA (e.g., produce the appropriate interconnections between the CLBs) may exceed 250,000 bits, making dynamic (e.g., xe2x80x9con the flyxe2x80x9d; on a cycle-by-cycle basis) re-configuration/re-programming relatively difficult and commercially impractical.
A method and apparatus for providing a programmable logic datapath that may be used in a field programmable device is described. According to one aspect of the invention, a programmable logic datapath is provided that includes a plurality of logic elements to perform various (Boolean) logic operations from operand bits that may be furnished from operand register banks, inputs to the field programmable device, results of previous operations, and so forth. The programmable logic datapath further includes circuitry to dynamically select, route and combine operand bits between the plurality of logic elements. In one embodiment, by providing control bits concurrently with operand bits to selecting, routing and combining circuitry, the programmable logic datapath of the invention can provide dynamic programmability on a cycle-by-cycle basis to perform a number of logic operations on inputs of various lengths and outputs.
According to another aspect of the invention, a field programmable device containing the programmable logic datapath, as well as additional circuitry for operating the programmable logic datapath, is provided. In one embodiment, the field programmable device includes circuitry for decoding indirect control vectors into direct control vectors that specify the operation(s) to be performed by the programmable logic datapath on a cycle by cycle basis.
According to another aspect of the invention, a field programmable device containing the programmable logic datapath contains additional datapath circuitry specialized for performing arithmetic operations.
According to a still further aspect of the invention, one or more field programmable devices containing some or all of these elements may be integrated onto a single semiconductor chip together with other system elements, including CPUs, specialized I/O circuits, FPGA circuits, and so on.
The programmable logic datapath overcomes many limitations of the prior art. Although a useful embodiment of it will require a much wider direct control vector than a CPU datapath, the programmable logic datapath can perform many complex logic operations in a single cycle that are well beyond the capability of a CPU datapath. Thus, for certain applications, it is more comparable to an ASIC or FPGA. However, the ASIC is not field programmable. Compared to an FPGA, the programmable logic datapath will require significantly fewer control bits for a given complexity of logic operations. Partly for this reason, it becomes practical to change the direct control vector on every cycle, which increases flexibility. Finally, because the programmable logic datapath uses predefined connections, with selecting and routing performed by multiplexers, the speed of interconnection paths is increased while the unpredictability of this speed is greatly decreased.