The present invention relates to techniques for improving the performance of computer systems.
Data signals in modern integrated circuits typically flow through large bundles of wires or buses, called “datapaths,” which can be 32 bits wide, 64 bits wide, or even wider. Control signals are often bundled into these wide buses as well. Note that each bit on the bus can be a logical “1” or a logical “0,” and different voltages can represent each of these two logical values. For example, the “0” value is typically represented by a ground voltage, and the “1” value by the power supply voltage. As computer systems continue to increase in complexity, these data bundles must often be routed over long distances across the computer system. These long routes can be on chips, on boards, or a combination of both. Because the routes are so long, they are often broken up into stages, with on-chip circuits periodically restoring, or repeating, these signals.
Transmission of these signals is often more efficient when sending one logical value than when sending the opposing logical value. For example, consider chips driving on-board traces that are resistively pulled high; if the bits are all “1”s, the driving chips do not need to overpower the pull-up resistors. Consequently, sending “1” values is more efficient than sending “0” values. Another example is the canonical domino circuit design, which consumes less power when sending a “0” than when sending a “1,” because “0” values cause no changes in the circuit voltages. Therefore, if a wide datapath uses such domino circuits, transmitting “0” signals across the datapath reduces the power consumption of the circuit.
By controlling the switching of the datapath signals, a given circuit's preferences can be exploited to reduce power consumption. For example, with a circuit that prefers logical “0”s, this can be accomplished by quickly computing whether the datapath contains more “0”s than “1”s (or vice versa). If the majority of bits are “0”s, the signals are transmitted unaltered. Alternatively, if more bits are “1”s, the logical inverse of the bits is transmitted—along with a special side signal which informs the receiver that all the bits were inverted.
Such a scheme adds considerable complexity to the datapath because the scheme requires a majority detector, a signal inverter, and the side signal. Unfortunately, the majority detector consumes a significant amount of power, which reduces the total power savings of the scheme. In addition, the delay through a majority detector impacts the throughput of the datapath, because a potential majority must first be detected before data can be sent. This delay, coupled with the delay of the extra inversion stage, can significantly impact the performance of the datapath.
In other circuits, the actual transmitted values matter less than whether or not they change. Sending the same data bit repeatedly, either “1” or “0,” costs little energy; but sending a pattern that repeatedly flips between a “1” and a “0” consumes a great deal of power. The power consumption of the datapath can be reduced by minimizing the number of times that the circuits switch. Minimizing the switching requires the circuit first to compare the current logical state of each bit in the datapath with its prior value. If the majority of bits did not change from the previous cycle to the current cycle, then the bits are sent without alteration. On the other hand, if the majority of bits did change, then the datapath sends the logical inverse of the data bits and asserts the special side signal to indicate the bits were inverted. In this scheme, the overhead includes per-bit comparison circuits (XORs), a majority detector, a signal inverter, and a side signal. As with the prior scheme, the power consumption of the majority detector reduces the potential benefits of this scheme. In addition, the XORs, signal inverter and majority detector introduce extra delay to the datapath.
A common element to both of these schemes is a majority detector. Majority detectors implemented in digital circuits are complex, slow, and power-hungry. Counting the majority of eight bits using simple logic gates, for example, requires detecting all possible groupings of five bits. Majority detectors built from analog differential circuits are far smaller but cannot easily scale to many bits. Analog majority detectors also suffer from design complexity and delay and power issues.