1. Field of the Invention
The invention relates to (re)configurable computing systems.
2. Description of the Related Art
Introduction
Villasenor and Magnione, Configurable Computing, Scientific American, June 1997, pages 66-71, describe the new era of computer design opened by computers that modify their hardware circuits as they operate. Configurable computing architectures combine elements of general-purpose computing and application-specific integrated circuits (ASICs). The general-purpose processor operates on fixed circuits that perform multiple tasks under software control. An ASIC contains circuits specialized to a particular task and often needs little or no software to instruct it. In a configurable computer, software commands can alter field programmable gate array (FPGA) circuits as needed to perform a changing variety of tasks.
The promise of configurable circuits is versatile configuration for optimal performance of very specific tasks. On the one hand, a configurable computer often is more versatile than a special purpose device such as an ASIC which may not be configurable to perform a wide range of tasks. On the other hand, a configurable computer, or perhaps an array of programmable elements, often can be configured to perform specialized functions faster than a general purpose processor. A configurable computer can be optimally configured for the task at hand; whereas a general purpose processor suited to a wide variety often may not be optimized for a particular task.
U.S. Pat. Nos. 5,361,373 and 5,600845, both issued to Gilson, entitled INTEGRATED CIRCUIT COMPUTING DEVICE COMPRISING DYNAMICALLY CONFIGURABLE GATE ARRAY HAVING A MICROPROCESSOR AND RECONFIGURABLE INSTRUCTION EXECUTION MEANS AND METHOD THEREFOR, discloses an integrated circuit computing device comprised of a dynamically configurable Filed programmable Gate Array (FPGA). This gate array is configured to implement a RISC processor and a Reconfigurable Instruction Execution Unit.
The Challenge of Reconfigurable Communications Among (Re)Configurable Processing Elements
An important challenge in the development of computer systems in general, and in (re)configurable computing systems in particular, is communication among processing elements (e.g., FPGAs) that comprise the system. The ability to reconfigure processing elements to perform different tasks generally requires the ability to also (re)configure communication among processing elements to meet the needs of the task at hand. The following patents illustrate just a few prior solutions to the problem of reconfiguring communication among reconfigurable processing elements.
U.S. Pat. No. 5,020,059, issued to Gorin et al., entitled RECONFIGURABLE SIGNAL PROCESSOR, discloses an interconnection scheme among processing elements (PEs) of a multiprocessor computing architecture; and means utilizing the unique interconnections for realizing, through PE reconfiguration, both fault tolerance and a wide variety of different overall topologies including binary trees and linear systolic arrays. (See Abstract) The reconfigurability allows many alternative PE network topologies to be grown or embedded in a PE lattice having identified PE or inter-PE connection faults. In one embodiment, 4-port PEs are arrayed in a square 4xc3x974 rectangular lattice which constitutes a basic 16-PE module. In one embodiment, each PE includes a digital signal processor, a memory and a configuration network. Each PE has four physical ports which connect to similar ports of its neighbors. For tree topologies, any of the four neighbors of a given PE may be selected as the parent of the given PE; and any or all of the remaining three neighboring PEs may be selected as the child(ren) PEs. (Column 2, lines 56-64) The functionality of the ports of each PE, which define the neighbor relations, may be controlled by instructions from an exterior source, such as a Host computer. The process of routing among ports within each PE may be software defined. By using a variant of a tree expansion scheme, the processor allows for virtually arbitrary up-sizing of the PE count to build virtually any size of tree network, with each size exhibiting the same degree of fault tolerance and reconfigurability. (Column 3, lines 1-14)
Gorin et al. assert that, importantly, their processor retains a logarithmic communications radius and uses identical and scale-invariant modules to grow. A property related to scale, is fast communications between a Host computer and the PE array. (Column 7, lines 22-24) PE configurations assembled as a binary tree, for example, have the advantageous property that if the number of PEs in the array are doubled, the layers through which communications must pass, increase by only one. This property, known as logarithmic communications radius, is desirable for large-scale PE arrays since it adds the least additional process time for initiating communications between Host and PEs. Salability is served by devising a single, basic PE port configuration as well as a basic module of board-mounted PEs, to realize any arbitrary number of PEs in an array. (Column 1, line 61-Column 2, line 4)
Gorin et al. also teaches a system comprising multiple printed circuit boards each mounted with 16 PEs. Each PE of the board has four ports. Two of the ports in each of the corner PEs in the lattice are available to effect communications external to the board. Further, each PE port communicates with one of the ports in the nearest neighbor PE. FIG. 1, which is from the Gorin et al. patent, shows three PE boards 1, 2 and 3 with the port-to-port PE connections for a tree lattice structure. The PEs are shown not in their fixed lattice structure, but in the actual tree geometry for data flow, which can be created by configuring the PE ports. (Column 10, line 64-Column 11, line 9)
U.S. Pat. No. 5,513,371 issued to Cypher et al., entitled HIERARCHICAL INTERCONNECTION NETWORK ARCHITECTURE FOR PARALLEL PROCESSING, HAVING INTERCONNECTIONS BETWEEN BIT-ADDRESSABLE NODES BASED ON ADDRESS BIT PERMUTATIONS, describes two new classes of interconnection networks referred to as hierarchical shuffle-exchange (HSE) and hierarchical de Bruijn (HdB) networks. The new HSE and HdB networks are highly regular and scalable and are thus very well suited to VLSI implementation. These networks are efficient in supporting the execution of a wide range of algorithms on computers whose processors are interconnected via an HSE or HdB network. (Abstract) FIG. 2, which is from the Cypher et al. patent, depicts an illustrative drawing of a two level HSE computer including 8 processors interconnected via an HSE network. FIG. 3, which is from the Cypher et al. patent, depicts an illustrative drawing of a two level HdB computer including 8 processors interconnected via an HdB network. Each level of an HSE or HdB hierarchy corresponds to a level of packaging (e.g., the chip level, the board level, or the rack level). Their hierarchical nature allows them to be partitioned into a number of identical components (chips, boards, racks, etc.). The design of these components does not depend on the number of processors in the parallel machine, so they can be combined to form arbitrarily large networks. Also, because each level of the hierarchy corresponds to a level of packaging, the widths of the connections at each level of the hierarchy can be matched to the constraints imposed by the corresponding level of packaging. As a result, these networks are efficient in implementing a wide range of algorithms. (Column 6, lines 32-44)
U.S. Pat. No. 5,661,662 issued to Butts et al., entitled STRUCTURES AND METHODS FOR ADDING STIMULUS AND RESPONSE FUNCTIONS TO A CIRCUIT DESIGN UNDERGOING EMULATION, discloses a plurality of electronically reconfigurable gate array logic chips interconnected via a reconfigurable interconnect, and electronic representations of large digital networks that are converted to take temporary operating hardware form on the interconnected chips. The reconfigurable interconnect permits the digital interconnect permits the digital network realized on the interconnected chips to be changed at will, making the system well suited for a variety of purposes including simulation, prototyping, execution and computing. FIGS. 4-4A, which are from the Butts et al patent, are schematic block diagrams of a cross-bar interconnect system disclosed by Butts et al.
U.S. Pat. No. 5,684,980 issued to Casselman, entitled FPGA VIRTUAL COMPUTER FOR EXECUTING A SEQUENCE OF PROGRAM INSTRUCTIONS BY SUCCESSIVELY RECONFIGURING A GROUP OF FPGA IN RESPONSE TO THOSE INSTRUCTIONS, discloses an array of FPGAs whose configurations change successively during performance of successive algorithms or instruction, in a manner of a computer executing successive instructions. In one aspect of the Casselman invention, adjacent FPGAs in the array are connected through external field programmable interconnection devices or cross-bar switches in order to relieve the internal resources of the FPGAs from any external connection tasks. This solved a perceived problem of having to employ 90% of the internal FPGA resources on external interconnection.
U.S. Pat. No. 5,689,661 issued to Hayashi et al., entitled RECONFIGURABLE TORUS NETWORK HAVING SWITCHES BETWEEN ALL ADJACENT PROCESSOR ELEMENTS FOR STATICALLY OR DYNAMICALLY SPLITTING THE NETWORK INTO A PLURALITY OF SUBSYSTEMS, discloses an n-dimensional torus-based parallel computer, n being an integer greater than 1. That is folded n times with the results of the folding embedded in an n-dimensional layer for connection with an interleave connecting unit. Four terminal switches or switch units are placed at folding positions. The switching units are changed so that any two of the four terminals are linked together. This permits the torus network to be split into subtorus networks or subtori. The subtori can be integrated into the original torus network whereby the reconfiguration of the torus network is realized. (Abstract) FIG. 5, which is from the Hayashi et al. patent, illustrates an embodiment of two-dimensional reconfigurable torus networks, which comprises 16xc3x9716 processors. (Column 6, lines 15-17)
U.S. Pat. No. 5,852,740 issued to Estes, entitled POLYMORPHIC NETWORK METHOD AND APPARATUS, depicts a modular polymorphic network interconnecting a plurality of electronically reconfigurable devices via a modular, polymorphic interconnect, to permit a fixed physical configuration of operating hardware devices to take on a plurality of logically addressable configurations. The modular polymorphic interconnect additionally permits the logical topology of selected electronically reconfigurable devices to be configured as at least one mixed-radix N-dimensional network. (Abstract) FIG. 6, which is from the Estes patent, shows a sixteen valued, mixed-radix 3-dimensional object name space 1407 disclosed in the Estes patent. (Column 20, lines 48-50) FIG. 7, which is from the Estes patent, illustrates a polymorphic interconnection network module for concurrent multiple element selection disclosed in the Estes patent. (Column 24, lines 17-19)
U.S. Pat. No. 5,956,518 issued to DeHon et al., entitled INTERMEDIATE-GRAIN RECONFIGURABLE PROCESSING DEVICE, discloses a programmable integrated circuit which utilizes a large number of intermediate-grain processing elements which are multibit processing elements arranged in a configurable mesh. (Abstract) Configuration control data defines data paths through the interconnect, which can be address inputs to memories, data inputs to memories and logic units, and instruction inputs to logic units. Thus, the interconnect is configurable to define an interdependent functionality of the functional units. A programmable configuration storage stores the reconfiguration data. (Column 2, lines 22-28)
DeHon et al. disclose a basic functional unit (BFU) that includes a core with a memory block, ALU core and configuration memory. (Column 5, lines 58-60) Several example configurations of the device are disclosed. There is a disclosure of the device operative as a single instruction multiple data (SIMD) system that is reconfigurable on a cycle-by-cycle basis. There is a disclosure of the device configured as a 32-bit wide microprocessor. There is a disclosure of the device configured as a multiple instruction multiple data (MIMD) system. There is a disclosure of the device configured as a very long instruction word (VLIW) system. (Column 5, lines 24-56) There is a disclosure of various convolution configurations. (Columns 16-28)
DeHon et al. disclose a network that joins the BFU cores into a complete array that comprises a three-level interconnect structure, incorporating regular neighbor mesh, longer switchable lines, and long broadcast lines. (Column 8, lines 18-21) In the level-1 shown in FIG. 8, which is from the DeHon et al. patent, network structure, the output of every BFU core is passed to its nearest neighbors in all directions. (Column 8, lines 23-25) In the level-2 network structure, shown in FIG. 9, which is from the DeHon et al. patent, length-4 broadcast lines are provided between rows and columns of cells containing a 5xc3x975 array of BFUs. (Column 8, lines 33-34) In the level-3 network structure, 4 shared network lines span every row and column. Each BFU gets to drive up to 4 inputs onto the level-3 network. In addition, every BFU has access to every level-3 line crossing it. (Column 8, lines 58-60)
U.S. Pat. No. 5, 960,191 issued to Sample et al., entitled EMULATION SYSTEM WITH TIME MULTIPLEXED INTERCONNECT, discloses a hardware emulation system which reduces hardware cost by time-multiplexing multiple design signals onto physical logic chip pins and printed circuit board. FIG. 10, which is from the Sample et al. patent, shows a block diagram of a partial crossbar network incorporating time-multiplexing disclosed by Sample et al.
Scaling Self-Similarity and Fractals
The term fractal was originally derived from the concept of xe2x80x9cfractal dimensionxe2x80x9d by Benoit Mandelbrot who showed how fractals can occur in many places both in mathematics and in nature. The Latin fractus means broken.
Hans Lauwerier in Fractals, Endlessly Repeated Geometric Figures, Princeton University Press, Princeton, N.J., 1991, describes fractals as follows in the introduction to his book.
xe2x80x9cA fractal is a geometric figure in which an identical motif repeats itself on an ever diminishing scale.xe2x80x9d (Page xi)
He goes on to state that,
xe2x80x9cFractals are characterized by a kind of built-in self-similarity in which a figure, a motif, keeps repeating itself on an ever-diminishing scale. A good example is a tree with a trunk that separates into two branches, which in turn separate into two smaller side branches, and so on. The final result is a tree fractal with an infinite number of branches; each individual branch, however small, can in turn be regarded as a small trunk that carries an entire tree.xe2x80x9d (Page xii)
He asserts that,
xe2x80x9cThe concept xe2x80x98fractalxe2x80x99 has already proved its use in many applied fields. There one often feels the need to extend the concept of similarity of some degree by introducing small changes to the series of similarity transformations, so called disturbances. If we introduce chance disturbances into a mathematically regular tree fractal the result may look like a real tree, coral or sponge.xe2x80x9d (Page xiii)
One example of a fractal is the xe2x80x9cH-Fractalxe2x80x9d illustrated in FIG. 11. (from Lauwerier, page 2, FIG. 1). According to Lauwerier,
xe2x80x9cA fractal is a geometrical figure that consists of an identical motif repeating itself on an ever-reducing scale. A good example is the H-fractal . . . Here the capital H is the repeating motif. The H-fractal is built up step by step out of a horizontal line-segment . . . taken to be of unit length. At the first step two line segments are placed perpendicularly at the ends of the original one . . . xe2x80x9d[A] reduction factor of [1/(2)1/2] has been chosen. At the second step, shorter horizontal line-segments are fastened on to the four endpoints in the same way. The same reduction factor makes the lengths of these half a unit. We continue like this for a long time.xe2x80x9d (Page 1)
There are those who perceive self-similarity as a fundamental principle of nature. Manfred Schroeder in Fractals, Chaos, Power Laws, W. H. Freeman and Company, New York, 1991, at page xii, offers a sweeping statement of the prevalence of self-similarity in nature.
xe2x80x9cThe unifying theory underlying fractals, chaos and power laws is self-similarity. Self-similarity, or invariance against changes in scale or size, is an attribute of many laws of nature and innumerable phenomena in the world around us. Self-similarity is, in fact, one of the decisive symmetries that shape our universe and our efforts to comprehend it.xe2x80x9d
Conclusion
Despite advances in reconfigurable communications among processing elements in reconfigurable computer systems, there continues to be a need for improvements in the interplay between reconfigurable processing elements and reconfigurable communication resources that interconnect such processing elements. There also exists a need to effectively apply the characteristics of fractals, which are ubiquitous in nature, to the design of computer systems. That is, there is a need for an improved computer system which exhibits fractal-like qualities, namely a meaningful degree of self-similarity on reducing scale, like the self-similarity that is manifest in nature. The present invention meets these needs.
A computer system is provided which includes: a first block which includes multiple processing subsystems; a second block which includes multiple processing subsystems; a third block which includes multiple processing subsystems; and a fourth block which includes multiple processing subsystems. A first communication and processing subsystem interconnects subsystems of the first and second blocks. A second communication and processing subsystem interconnects subsystems of the third and fourth blocks. A third communication and processing subsystem interconnects subsystems of the first and fourth blocks. A fourth communication and processing subsystem interconnects subsystems of the second and third blocks. Respective subsystems include a respective processing elements a respective communication and processing unit interconnecting the respective elements.
In one aspect, a present embodiment of the invention exhibits a fractal-like scaling of processing resources and communication resources. In one embodiment, a system architecture comprising processing element subsystems features a motif in which a ratio of approximately four processing resources to one communication resource repeats itself on a diminishing scale as the view of the system progresses from level three to level two. It will be appreciated from the TABLE below, that block 164 ( and each of blocks 178, 192 and 194 of FIGS. 17-22) comprises thirty-two PEs that are interconnected via A and B intra-connection lines. The respective thirty-two processing unit PEs of respective blocks 164, 178, 192 and 194 are connected to respective networks of four PEs (i.e., 118-1, 116-9, 188-1, 188-2, 190-1 ,190-2). In addition, each respective block comprises four respective communication and processing units for a total of approximately eight communication and processing units per block. Thus, for level three there is an ratio of processing resources to communication resources of approximately 4-to-1. It will be further appreciated from the TABLE below that, as shown in FIG. 13, Level Two processing unit 116 with its four PEs 100-1 to 100-4 is connected to a one Level Two Communication and Processing Unit 120. Moreover, every respective one of the thirty-two Level Two Processing Units in the system of a present embodiment has a similar 4-to-1 ratio between the number of processing unit PEs and the number of communications and processing units. Thus, consistent with fractals in nature, the motif of the present embodiment maintains a significant degree of self-similarity with respect to the ratio of processing resources to compute resources in moving from the level three to the level two views of the system.
In another aspect of the invention, it will be appreciated that the hierarchy levels of a present embodiment of the invention overlap. Thus, there is no rigid hierarchy. For instance, processing element 182-1 of FIG. 17 is simultaneously a level one processing element and a level two processing element as part of the level two processing unit 116-9 and a level three communication and processing unit because of its connection with communication and processing unit 180-9. Similarly, for example, processing element 180-9 of FIG. 17 is simultaneously a level one processing element and a level two communication and processing element as part of Level Two Subsystem 114-8 and a level three communication and processing element due to its connection to processing element 182-1. Thus, for example, as with a naturally occurring tree fractal in which a branch can be a trunk and a branch simultaneously, a communication processing element of a present embodiment can simultaneously serve as part of the communication fabric of multiple levels of system hierarchy.
The scaling of processing resources with communication resources so that a ratio of processing resources to communication resources remains approximately constant from one level of the system hierarchy to the next has important ramifications. For instance, there are likely to be fewer deleted neighborhoods in a given style of processing architecture created by configuring the system. Moreover, there is more likely to be a continuous function that can be used to describe virtually all permutations of a processing architecture. As a result, the present system may, in effect, constitute a continuous compute substrate that can amalgamate an arbitrary algorithm with the hardware used to process the algorithm.