High speed multi-processor systems depend on high-speed communication between individual processor cells within an array of cells. In a typical design, each processor cell is comprised of a central processing unit and its associated local memory, and communicates with four of its neighbors over high speed (e.g., 40 Mbyte/sec) data channels.
Pathways of processor cells in close electrical proximity (those communicating with cells located on the same printed-circuit board or on the same backplane) communicate in what is referred to near-neighbor mode; data is generated in the transmitting cell, propagated over printed circuit board traces, is accepted by the receiving cell, and is utilized by the receiving cell immediately after its reception. This simple communication method is possible because the sum of clock skew (i.e., misalignment of clock edges arriving at different cells), data transmission delay, and setup time is less than the period of the system clock. In short, nothing special need be done to the data because it arrives soon enough to be used immediately.
When cells must communicate across backplane or cabinet boundaries, data is typically transmitted differentially over cables. This introduces propagation delays caused by the differential transceivers used to drive the cable and by the speed-of-light delay of the cable itself. This creates two design problems. First, the addition of the transceiver propagation delay and the cable propagation delay make the transmitted data arrive at the receiver too late to be immediately used. Second, different systems may use different combinations of backplanes, cables, and cabinets. Neither the system software nor system hardware is aware of the physical configuration of the system, and so neither can compensate for these additional delays.
Much of the recent work in data synchronization has been done in token-ring systems. These are usually fully asynchronous systems (the clocks in different units can differ in both frequency and phase) that use bit-serial communication. Because the operating frequency of transmitter and receiver can be different, these systems must constantly re-synchronize the bit streams they are exchanging. This, of course, comes at the expense of the maximum communication bandwidth of the system.
One such prior art system uses what is known as an ATT T1 PCM transceiver which is an interface supporting bit-serial digital communication over T1 phone lines. It uses an elastic buffer (essentially a First-In-First-Out (FIFO) buffer) whose depth grows and shrinks to accommodate the differing timing relationship of the transmitting and receiving systems. In this transceiver, the framing sync (synchronizing the start of long packets of information) is accomplished in firmware by using an internal microprogrammed microprocessor. Bit sync (synchronizing individual bits in a bit stream once framing sync has been established) is accomplished by hardware in the receiver. This system is flexible and easily changed, but it consumes a good deal of silicon area, is overkill for simple systems (like the phase-asynchronous system of the present invention), and requires that some of the ring data bandwidth be taken up by the transmission of re-synchronizing START bits.
Another prior art system is the IBM Token-Ring adapter which is similar in function to the ATT T1 PCM transceiver. In the IBM system, however, the synchronization occurs in the transmitter. A microprogrammed microprocessor implements an elastic buffer in firmware. This system shares the advantages and disadvantages of the ATT T1 PCM transceiver.
A third system is an interface in a system known as a Cambridge Fast Ring which is comprised of an Emitter Coupled-Logic (ECL) repeater chip and a CMOS control-logic chip. Synchronization is implemented in hardware on the CMOS chip, and is part of the transmitter. Because this approach uses dedicated hardware rather than the microprocessor used in the above examples, it is particularly suited for use in systems using crowded microprocessor chips. However, it shares with the other prior art systems a need to receive a START bit periodically to resynchronize. In addition, it is also a bit-serial approach, as opposed to the parallel approach of the present invention.