The invention relates to synchronous communication networks, such as for communication between computer processing nodes.
Parallel computer processor systems may contain large numbers of processing nodes. Each processing node may be a processor, a memory element, an input/output server, or some other computer processor or peripheral. The processing nodes in parallel computer processor systems are commonly connected by a network of switch nodes that transfer information between the processing nodes.
In parallel computer processor systems, it is desirable to have a high degree of connectivity. The greater the connectivity, the fewer the number of communication hops required to pass information through the network, potentially reducing the communication latency. Increasing connectivity also increases the bisectional bandwidth (the maximum bandwidth achieved when half of the processing nodes are attempting to send information to the other half of the processing nodes), and the improves the ability of the network to perform in some degraded fashion in the presence of faulty network components.
In a point-to-point interconnection network, each network device contains at least one communication port. Each communication port contains, at least in part, logic circuits for transmitting data to or for receiving data from any other compatible communication port via one or more communication links or transmission lines.
In a synchronous network, all of the communication ports are controlled by a common oscillator called the system clock. A synchronous, simplex communication stage contains a source communication port for transmitting a plurality of data signals from the port in synchronism with a first image of the system clock, contains a receiving port for receiving data signals in synchronism with a different image of the system clock, and contains a transmission line for propagating data signals over a fixed distance between the two ports.
Synchronous networks are attractive for reducing communication latency between connected network devices. Asynchronous communication networks use communication protocols which may require multiple machine cycles of processing of input data signals before the signal information call be used reliably, or they may operate at only a fraction of the speed of the communicating network devices.
In synchronous networks, each communication port is driven by a distinct image of the system clock. If information sent from one communication port to another arrives inside the critical setup or hold regions of the register logic of the receiving port, the information may be incorrectly latched. The mechanism for ensuring that data signals never change in the setup and hold regions is referred to as tuning.
In networks that contain exclusively short links less than one clock cycle in electrical distance, tuning may be achieved by providing a mechanism for ensuring that each network device operates with approximately the same image of the system clock. However, in constructing high frequency, highly connected networks, links might not always be less than one clock cycle in electrical distance. Hence, operating each network device from the same image of the system clock does not guarantee that signals will not change during the setup or hold times at the receiving port.
In one approach to tuning a high frequency, highly connected synchronous network, the wires comprising the simplex links can be cut to lengths which ensure proper signal timing. However, any change in system clock frequency or system clock phase may necessitate further adjustments. Moreover, clipping wires is labor intensive, error prone, and inflexible.
In another approach to tuning a synchronous network, a clocking signal is added to the data, effectively treating the data as asynchronous. This method forces the receiving port to perform a multiple cycle synchronization operation on each incoming signal, thereby increasing communication latency.
U.S. Pat. No. 4,700,347 describes yet another approach to tuning a synchronous communication network. In this approach, each input data signal is delayed in the receiving port by a plurality of monotonically increasing delay circuits. The output of each delay circuit is latched in a testing flip-flop clocked by the remote image of the system clock in the receiving port. The latched signal values in pairs of successive testing flip-flops are then compared to detect discrepancies in the latched input signal. A discrepancy indicates that the associated pair of delay circuits delay a transition in the input data signal to opposite sides of the setup and hold regions in the testing flip-flops. The delay circuits associated with these "disagreeing" flip-flops therefore produce delayed input signals which change value near the setup and hold regions of the register logic of the receiving port, and should be avoided. Instead, the input data signal is obtained from a flip-flop associated with a delay circuit which delays the input signal approximately one-half cycle before or after the delay circuits associated with the "disagreeing" flip-flops. This approach can be used continuously or at regular intervals to maintain proper synchronization.
To separately determine the optimal delay circuits for each of a plurality of input signals, the approach described in U.S. Pat. No. 4,700,347 requires a plurality of separate delay circuits dedicated to each input of each receiving port. Furthermore, because all signal delay adjustment is performed by the receiver, the delay selection mechanism must either (a) be implemented entirely in hardware, or (b) be controlled by software or program logic contained entirely within the receiving device. Consequently, the transmitting device cannot communicate with the receiving device unless the tuning circuits and the tuning process in the receiving device work properly.