The present invention relates generally to an N-port wide bandwidth cross-link register, and more particularly to apparatus for passing data and control signals between processors or other digital interconnection units of a parallel computer system.
Bandwidth in this context refers to the data throughput of a port connected between two digital interconnection units (DIUs). The factors which affect bandwidth are (1) the frequency at which discrete data are transmitted on individual interconnections between DIUs, (2) the protocol of transmission (such as simplex, duplex, and provisions or error correction-detection schemes, which improve the reliability of operation in the presence of noise but degrade bandwidth), and (3) most importantly for the present invention, the number of interconnections between DIUs.
One of the dilemmas with modern digital processing is that it is often performed with stored-program or Von Neumann computers. These computers may consist of many thousands of individual electronic components, but the method of program execution is not optimal, since each instruction is executed one-by-one. This bottleneck has been the dogma of computer architects, who have long sought ways to effect more efficient use of the hardware available. One of the most obvious techniques to improve the efficiency of Von Neumann computers is to operate more than one of them at the same time.
Concepts involving the application of parallel Von Neumann computers have been emerged in several forms. One class of architectures features large groups of specialized, identical processing elements. Such architectures are said to possess fine granularity. Examples include vector processors and systolic arrays. These architectures typically have limited flexibility in application. Other forms of parallelism employ a few identical, relatively powerful (often specialized) processors, arbitrated by another processor (usually a general-purpose computer of more modest capability). These architectures are sometimes said to possess coarse granularity. Again, these architectures are often limited to performing specialized, high-throughput processing applications such as digital signal processing. Other processing applications feature networks of identical general-purpose or special-purpose processors interconnected in various topologies. Examples of these topologies include ring-bus, mesh, and hyper-cube architectures. Although more general applications may be pursued by the latter architectures, the overall throughput of the network is sub-optimal. The major reasons for the loss of efficiency are that: (1) each node is only connected to a few other processors, and (2) in some cases, a number of processors can access common buses, but they cannot do so simultaneously. In the latter case, a technique known as collision-sensing, multiple access (CSMA) arbitration is used to detect attempts by two or more nodes to simultaneously access a common bus. Unfortunately, when a collision occurs, one or more of the nodes must back up and access an auxiliary bus and/or wait for a statistically-determined interval to re-access the same bus.
The following United States patents are of interest. U.S. Pat. Nos.
4,161,790--Winston
4,907,228--Bruckert et al
4,916,704--Bruckert et al.
The patent to Winston teaches a method of loading a multi-digit binary work to an electronic circuit board. U.S. Pat. No. 4,907,228 to Bruckert teaches a dual processor computer system for executing a series of instructions. U.S. Pat. No. 4,916,704 to Bruckert teaches a fault tolerant computer system having duplicate computer systems that operate simultaneously.