The notion of cellular computation, including the collective computation by an array of regularly interconnected nodes, has been a recurring theme in computer architecture since the early days of computer science. Ranging from cellular automata to systolic arrays, a large body of algorithms and architectures have centered on the notion of a fabric composed of simple automata interacting with immediate neighbors in an n-dimensional mesh. Fabric-based architectures are particularly attractive for efficient layout in VLSI, and specialized chips have been devised for solving specific computational problems, especially in image and array processing. Specialized ASICs, however, designed to solve specific instances of problems are not cost-effective, and relatively few of the scores of designs have actually been fabricated, with virtually none seeing widespread use.
Fabric-based architectures have been an attractive design point within the reconfigurable computing community. One of the first proposals for a fabric-based architecture using commercial FPGAs was the Programmable Active Memory from the DEC Paris Research Laboratory. Based on FPGA technology, a PAM (Programmable Active Memories) is a virtual machine controlled by a standard microprocessor, and can be dynamically configured into a large number of application-specific circuits. PAM introduced the concept of “active” memory, and is generally attached to a high-speed bus of a host computer, similar to most RAM modules. Unlike RAM, however, the PAM processes data between write and read instructions.
Another important concept is that of bi-directional communication links such as the communication structure proposed by the Remarc project, a fabric-based mesh co-processor. Although the Remarc project does offer a number of advantages, this type of architecture does not permit an associated communications network to operate concurrently with cellular computation, which is important in permitting full utilization of the cell in the fabric.
Programmable data paths have also been proposed, such as linear arrays of function units composed of 16-bit adders, multipliers and registers connectable through a reconfigurable bus. In such architectures, logic blocks are generally optimized for large computations, thereby performing operations more quickly (i.e., and consuming less chip area) than a set of smaller cells connected to form the same type of architecture.
Other architectures incorporate a configurable cache to hold recently stored configurations for rapid reloading (e.g., five cycles instead of thousands of processor cycles). Such architectures, however, typically do not adequately hold instruction set-programs for the cells, thereby limiting their usefulness in highly computational applications.
One final concept is the notion of systolic flow execution, which is essentially the ability to flow data to function units from an interconnection network in addition to the traditional mode of fetching operands from memory to execute in the function units. In some architectures, a flow graph may be automatically mapped to a processing element in the array. Here, the granularity of operation plays a vital role in system performance. These type of architectures, however, rely upon a pre-defined function unit and are typically not customizable.
Based on the foregoing, the present inventors have concluded that a need exists for an improved fabric architecture based on a mesh-connected configurable network of runtime re-configurable cells. Present architectural methods and systems lack this important ability. Thus, the invention disclosed herein has been designed to overcome the problems associated with current fabric-based architectures and systems thereof.