In conventional networking application and other digital systems, high frequency (>1 GHz), high bandwidth data signals are common. A conventional technique for moving data on-chip is to drive full-swing signals to the wires. To reduce propagation delay due to parasitic resistance and capacitance on the wires, repeaters are added at frequent intervals. The addition of repeaters at frequent intervals can create power consumption and signal integrity concerns when, for example, a thousand signals drive 10 mm wires simultaneously. A conventional solution is to use low voltage swing signals on the wires to reduce the power consumption. However, the conventional solution uses a very low skew clock for sense amplifiers in the repeaters to read data from the low voltage swing signals.
Referring to FIG. 1, a diagram is shown illustrating a crossbar (Xbar) switch 10 with a low skew clock CLK. The crossbar switch 10 has a number of switches (or multiplexers) 12 that direct data from any input port 14 to any output port 16 according to addresses provided by a system arbitrator. In the crossbar switch 10, the input data moves horizontally while the output data moves vertically. Each of the switches 12 in the crossbar switch 10 receives data from an input port 14 and propagates the data to an output port 16 when selected. All the switch operations including the input ports and output ports are synchronized by the low skew clock CLK.
The conventional way to implement the low skew clock CLK is to use a balanced clock tree 18. Each tap on the clock tree 18 has the same delay and output loading to produce the low skew clock CLK. The main reason for the clock tree 18 is to reduce the clock skew and synchronize operation of the circuit. For example, a falling edge of the low skew clock CLK can start pre-charging of the horizontal wires while a rising edge of the clock launches data propagation, and vice versa for the vertical wires.
There are drawbacks to using the balanced clock tree 18. One drawback is the duty cycle of the clock. The duty cycle of the clock can be less than ideal (50%-50%). A less than ideal duty cycle of the clock either reduces the time allowed for data propagation or pre-charging the wires. For example, for a 1 GHz clock with a 40-60 duty cycle (i.e., 40% HIGH and 60% LOW), the HIGH clock drives the horizontal wires in 400 ps while the LOW clock drives the vertical wires in 600 ps. When horizontal wires have the same length as the vertical wires, the slack time to drive the horizontal wires can be less than the vertical wires and performance can be reduced.
Another drawback of using the balanced clock tree 18 is the clock tree power consumption. To reduce the clock skew, many buffers/repeaters are placed along the wires to reduce the transition time of the clock waveforms as well as the fanout. The power consumption from the clock tree 18 alone can contribute a significant portion of the total power consumption of the switch 10. It is not uncommon for the power consumption from the clock tree 18 alone to account for more than 30% of the total power consumption.
It would be desirable to have asynchronous low-swing differential repeaters that may be inserted along a wire to enable more optimal fine-tuning of transistor size and wire length. A clockless on-chip global interconnect design would be desirable to further reduce power consumption, improve signal integrity, and eliminate design dependency on clock duty cycle.