Continuing advances in semiconductor technology have greatly increased the amount of processing that can be performed by single-chip, general-purpose computing devices. The relatively slow increase in inter-chip communication bandwidth requires that modern high performance devices use as much of the potential on-chip processing power as possible. This results in large, dense integrated circuit devices and a large design space of processing architectures.
One way of viewing this design space is in terms of granularity. Designers have the option of building very large processing units, or many smaller ones, in the same space. Traditional architectures are either very coarse grain, such as microprocessors, or very fine grain, such as field programmable gate arrays (FPGAs). Both architectures have advantages and disadvantages.
Microprocessors incorporate very few large processing units that operate on wide data-words, and each unit is hardwired to perform defined instructions on these data-words. Usually each unit is optimized for a different set of instructions, such as integer and floating point, and the units are generally hardwired to operate in parallel. The hardwired nature of these units allows very rapid instructions. In fact, a great deal of area on modem microprocessor chips is dedicated to cache memories in order to support a very high rate of instruction issue. Thus, the devices efficiently handle very dynamic instruction streams.
Very fine grain devices, such as FPGAs, incorporate a large number of very small processing elements. These elements are arranged in a configurable interconnect network. The configuration data used to define the functionality of the processing units and network can be thought of as a very large, semantically powerful, instruction word. Nearly any operation can be described and mapped to hardware.
Unfortunately, because microprocessors are highly optimized for simple, wide-word, dynamic instructions, they are relatively inefficient when performing other kinds of operations. For example, many cycles are required to build up complex operations that are not part of the processor""s pre-selected instruction set. Also, when performing short-word operations, much of the processing unit is not being used, and when the instructions being issued are very regular, the large instruction caches are unnecessary. Thus, very coarse-grain microprocessors are not equipped to take the maximum advantage of these cases.
The size of the xe2x80x9cinstruction wordxe2x80x9d creates a number of problems with fine-grain FPGA devices, however. Reloading new instructions takes a relatively long time, making dynamic instruction streams very difficult for these devices. Moreover, if the operation being performed is, in fact, a wide word operation, a great deal of this xe2x80x9cinstruction wordxe2x80x9d must be dedicated to re-describing the operation for each of the small processing elements. Thus, fine grain processing elements are not well equipped to take advantage of a large number of common computing operations.
The present invention utilizes a large number of intermediate-grain processing elements which are arranged in a configurable mesh. Thus, the regularity and rapid instruction issue features of coarse-grain units are exploited, but a reconfigurable or programmable interconnect allows these units to be connected in an application-specific manner. This means that coarse-grain resources, such as memory and processing, can be deployed in a way that takes advantage of the opportunities for optimization present in any given problem. In addition, configuration memories may be deployed to take advantage of application specific redundancy.
In general according to one aspect, the invention features a programmable integrated circuit that comprises a logic units that perform operations on data in response to instructions and memories that store and retrieve addressed data. A configurable or programmable interconnect provides a mode of signal transmission between the logic units and memories. Configuration control data defines data paths through the interconnect, which can be address inputs to memories, data inputs to memories and logic units, and instruction inputs to logic units. Thus, the interconnect is configurable to define an interdependent functionality of the functional units. A programmable configuration storage stores the configuration control data.
Thus the present invention may be configured to operate according to a number of traditionally distinct computing architectures. For example, a centrally located functional unit may be assigned the role of arithmetic logic unit (ALU) with memories of surrounding functional units being configured to act as instruction caches, register files, and program counters. Wider data paths are accommodated by tying near-neighbor ALUs to each other. Wider instructions are achieved by configuring instruction memories of separate functional units as if they were a single memory. For a different problem, the same integrated circuit may be reconfigured to emulate a single instruction multiple data (SIMD) architecture. The logic units of rows of functional units are tied together to create wider data paths, and the rows perform separate serial tasks.
In specific embodiments, functional units may provide at least part of the instructions to logic units of other functional units. Also, the configuration storage may hold multiple contexts of configuration control data for reconfiguration of the programmable interconnect.
In other embodiments, the interconnect may support three different modes of operation: a static value in which a value set by the configuration data is provided to a functional unit or static source in which another functional unit serves as the value source. A dynamic source mode can be included in which the source is determined by the value from another functional unit.
In still other embodiments, each logic unit can also have programmable logic arrays on data paths between functional units which perform bit level logic operations. Additionally, reduction logic can be added that performs logic operations on the output of the logic units and passes a result to other functional units as control information. Network drivers are assigned to each unit to transmit received signals to other functional units. The sources of the signals received by the drivers may also be dynamic so that the sources are programmable by other functional units.
In general according to another aspect, the invention features an integrated reconfigurable computing device, which has functional units of multi-bit arithmetic logic units and memories. A configurable interconnect that connects the units includes function ports which determine the source of the instructions to the logic units. Network ports of the units are configurable by the functional units and determine the source of addresses to the memories and the source of data to the logic units and memories.
In general according to still another aspect, the invention can also be characterized in the context of a method for organizing signal transmission within an array of functional units. Data read from the memories of functional units may be transmitted as instructions to the logic units of other functional units. Also, data read from logic units may be transmitted as addresses for the memories of other functional units. Finally, the data read from functional units can also be used as data inputs for the logic units of other functional units.
In specific embodiments, the paths of the data and instructions are dynamic in response to control from the functional units. More specifically, static values, values from other functional units, and values from sources may be transmitted between functional units.
The above and other features of the invention including various novel details of construction and combinations of parts, and other advantages, will now be more particularly described with reference to the accompanying drawings and pointed out in the claims. It will be understood that the particular method and device embodying the invention are shown by way of illustration and not as a limitation of the invention. The principles and features of this invention may be employed in various and numerous embodiments without departing from the scope of the invention.