1. Field of the Invention
This invention relates in general to the field of microelectronics, and more particularly to an apparatus and method for performing data bus inversion within a present day microprocessor.
2. Description of the Related Art
Many present day computer-based systems rely heavily on high-speed parallel buses to transfer address, data, control, and input/output information. The well-known source synchronous quad-pumped data bus that is employed in x86-compatible microprocessors is just one example of a 64-bit data bus that is divided into four data signal subgroups, each consisting of 16 bits. Each of the 16-bit data signal subgroups is routed over a separate path, typically via motherboard traces, and additionally includes source strobe signals and bus inversion signals that are unique to that data signal subgroup. The source strobe signals are strobed by a sending element on the bus to indicate validity of the data, or information, on the data signal subgroup. And the bus inversion signals are asserted to indicate that the information itself is being transmitted in complementary form. That is, when a sending element provides the inverted states of the data that is being transmitted over the data signal subgroup, the bus inversion signals for that data signal subgroup are asserted to indicate that inverted data is being transmitted rather than true data. When the states of a particular data signal subgroup are complemented for transmission over the data bus, this is known as a data bus inversion technique.
Data bus inversion is becoming increasingly prevalent in present day bused system designs as a result of increased emphasis on reducing the power required for bus transactions and a continuing need to minimize bus noise. Accordingly, as one skilled in the art will appreciate, both power and noise are minimized when the number of bits that change state on the bus, or signal group, are minimized.
Consequently, designers have provided elements within many present day integrated circuits that compare the current states of a given group of bus signals with the states which are to be transmitted during a following bus cycle. And if the number of signals that will change state during the next bus cycle is greater than, say, half of the total number of signals in the given group, then, rather than transmitting the true states of the given group during the next bus cycle, the bitwise complement of the true states are transmitted, and a corresponding data bus inversion signal is asserted to indicate that the inverted states of the data are being transmitted rather than the true states. Thus, a lesser number of state transitions occur over the bus from cycle to cycle, resulting in a savings in power and markedly reduced bus noise.
Determining which bits of a given signal group will change is relatively straightforward. The current, or last, data states are bitwise compared with the states to be transmitted via performing a bitwise exclusive-OR operation. The result is a number of exclusive-OR result bits that are asserted for those bits on the bus that will change during the next bus cycle. While determining which of the bits that will change is simple from a circuit design perspective, the operation of counting the number of bits that will change is not so simple.
In most systems, a series of full adders are employed to count the bits which will change, that is, the number of asserted bits on the exclusive-OR result bus. At a first stage, a number of 1-bit adders are employed to add the number of asserted bits in a subgroup of bits of the exclusive-OR result bus. As one skilled in the art will appreciate, 1-bit adders accept three inputs: a first input, a second input, and a carry input. And a 1-bit output along with a carry out bit are generated. Thus, the 1-bit adder generates the sum of its three inputs and generates a 2-bit binary output.
The 2-bit outputs from two adjacent 1-bit adders are next provided as inputs, along with an additional bit from the exclusive-OR result bus, to a 2-bit adder, which generates a 3-bit sum on its output. The 3-bit outputs from two adjacent 2-bit adders are then routed as inputs to a 3-bit adder along with another uncounted bit from the exclusive-OR result bus, which in turn generates a 4-bit sum on its output. The stages of full addition continue with increasing size of the adders in a subsequent stage, until all, or a majority of the bits on the exclusive-OR result bus have been counted for changed state. The output of a final adder stage indicates the number of bits that will change during the next cycle, of those bits which have been counted.
As one skilled in the art will appreciate, the implementation of a full adder requires that an exclusive-OR of the adder's inputs be performed. And to perform an exclusive-OR operation requires that all of the inputs be inverted to provide complementary states for performing the exclusive-OR operation.
The present inventor has observed that the generation of complementary states for the inputs to a full-adder does not create a problem when static logic design techniques are employed. However, more and more integrated circuit designs, and particularly those associated with high-speed bused systems, are utilizing dynamic logic design techniques, where many circuits therein utilize so-called domino logic.
Dynamic logic designs are different from static logic because they utilize a gated clock to evaluate combinational logic circuits. The clock is employed to synchronize transitions in sequential logic circuits, as in a pipeline microprocessor where the design is decomposed into many different pipeline stages, and the clock is used to synchronize the transfer of data from one stage to the next, like stations in an assembly line.
In most dynamic logic circuits, their output is driven high or low during a given half-cycle of the clock, and the circuits are allowed to transition to the opposite state as a function of the inputs, during the other half of the clock cycle. Thus, the clock signal becomes an integral and gating mechanism in all dynamic circuits. It is not within the scope of the present application to provide a tutorial on dynamic logic design techniques, however, it is sufficient to note that following a dynamic stage that is gated by the clock, with one or more static stages, for purposes of evaluating additional input data, is known as “domino” logic. This is because when the clock transitions to allow the inputs of the dynamic stage to evaluate, the states of the subsequent static stages transition like a row of dominoes.
Consequently, any additional gate delays that are required to evaluate a given set of inputs adds latency to the time required to evaluate a given set of inputs. And the present inventor has noted that when domino logic elements are employed, the additional gate delays that are required to generate the complements of the input states for any of the full adders in a data bus inversion mechanism as described above, are unacceptable. The present inventor has thus sensed a need in the art to provide a data bus inversion mechanism that can more easily be implemented using domino logic design techniques than that which is presently provided.
The present inventor has also observed that the use of 1-bit adders as a first stage to count bits in an exclusive-OR result bus accounts for bits on the bus in groups of three: a first input, a second input, and a carry input. So, for a 16-bit signal group, a designer is either forced to implement an additional 1 bit adder to account for the last bit in the group of 16, or to simply ignore one of the bits during the evaluation for data bus inversion. It is more likely than not that the last bit is simply ignored, and thus, the power and noise attributes on the bus are sacrificed.
Accordingly, the present inventor has also noted a desire in the art to evaluate all of the bits within a signal group for state transition in order to minimize the power consumed and noise produced from one cycle to the next over a high-speed bus.
Therefore, it is an object of the present invention to provide a bus state sense mechanism that tests all of the bits within a given signal group for state changes. In addition, it is an object of the present invention to provide a data bus inversion technique that reduces the latency incurred to determine whether or not a data bus inversion is to be performed, and particularly when using domino logic elements.