The present invention is directed, in general, to data processors and, more specifically, to a circuit that determines whether or not a number on a data bus is a power of two.
The demand for high performance computers and communication devices requires that state-of-the-art digital signal processors (DSPs) and general purpose microprocessors, such as x86 based microprocessors, execute instructions in the minimum amount of time. A number of different approaches have been taken to decrease instruction execution time, thereby increasing processor throughput. One way to increase processor throughput is to use a pipeline architecture in which the processor is divided into separate processing stages that form the pipeline. Instructions are broken down into elemental steps that are executed in different stages in an assembly line fashion.
Pipelining refers to the simultaneous processing of multiple instructions in the pipeline. For example, if a processor executes each instruction in five stages and each stage requires a single clock cycle to perform its function, then five separate instructions can be processed simultaneously in the pipeline, with the processing of one instruction completed during each clock cycle. Hence, the instruction throughput of an N stage pipelined architecture is, in theory, N times greater than the throughput of a non-pipelined architecture that completes only one instruction every N clock cycles. However, the speed improvements provided by pipeline architectures and superpipelining processing are ultimately limited by speed at which the individual stages in the pipeline execute. It is therefore important to minimize the time required to execute each part of an instruction.
Mathematical operations often incur substantial time delays in calculating a value. Counting the number of Logic 1 bits on a data bus is a common operation encountered in computer instruction sets (e.g., ST20C2 Core Instruction Set Reference Manual, SGS-Thomson Microelectronics, November 1997) and as a component function in various digital blocks, such as memory interface units (e.g., N. J. Richardson, Private Communication). The function can serve a number of different purposes, including determining the number of valid bits set in some control logic and performing a simple error detection operation. The input to such a function is an n-bit wide bus in which an arbitrary number of bits are set to a Logic 1 value and the other bits are set to a Logic 0 value. The output for this function is a log2(n) bit binary number equal to the number of ones on the input bus.
The problem of counting the number of ones on a bus is a simplified analog to the compression tree in a multiplier. Writing the numbers to be added as a vertical row, it is observed that the numbers represent a single column of a multiplier. Designing large multipliers is a well-known problem in digital design (See D. Goldberg, Appendix A: Computer Arithmetic in Computer Architecturexe2x80x94A Quantitative Approach, by J. L. Hennessy and D. A. Patterson, 2nd Edition, Morgan Kaufmann Publishers Inc., San Francisco, Calif., 1996. See also I. Koren, Computer Arithmetic Algorithms, Prentice Hall, Englewood Cliffs, N.J., 1993).
The procedure for completing the multiplication operation involves two steps. On the first step, the partial products terms are compressed to two terms. This can be done using a number of different compression schemes, including Booth encoding and various trees of full adders, 4:2 carry-save adders (CSA42s), 5:3 carry-save adders (CSA53s), 7:3 carry-save adders (CSA73s), and the like. With two partial products, the final result of the multiplication operation is calculated using a carry-propagate adder (CPA). Again, there is a large literature on the optimum design of adders, including carry-select adders, carry look-ahead adders, and the like.
Because the problem of counting the number of Logic 1 bits on a data bus is such a common operation encountered in computer instruction sets, it is important to minimize the execution time of such an operation. However, as the bus grows larger, more stages of adders are required to perform the count and more propagation delays are encountered.
A related mathematical operation is the detection of numbers equal to power of 2 on a data bus. In binary notation, a number that is a power of 2 contains one and only one Logic 1 bit. All other bits are Logic 0. Therefore, on an 8-bit bus, a power of 2 would appear as a single Logic 1 bit and seven Logic 0 bits. For example, on an 8-bit bus, 8=23=00001000. Similarly, on an 8-bit bus, 128=27=01000000. A circuit that counts the number of Logic 1 bits on an address bus or data bus can also be used to detect powers of two on the bus. Powers of two represent the special case where the count of Logic 1 bits on the bus equals one.
Therefore, there is a need in the art for data processors that minimize the execution time of common mathematical operations. In particular, there is a need for a circuit capable of rapidly determining the number of Logic 1 bits on a bus in a microprocessor, memory interface, or other data processing device. More particularly, there is a need for a Logic 1 bit counting circuit that minimizes the number of stages required to count Logic 1 bits on a data bus. Moreover, there is a need for a circuit capable of rapidly determining that there is one and only one Logic 1 bit on a bus in a microprocessor, memory interface, or other data processing device in order to detect values that are equal to a power of two.
The present disclosure uses the following abbreviations and definitions to designate adder cells:
1. HAxe2x80x94Half adder. A half adder adds two input bits and provides the result as a two bit output, generally called sum (S) and carry (C). Carry has a weight of 2 and sum has a weight of 1.
2. CSA32xe2x80x94Full adder. A full adder that counts three input bits and provides the result (i.e., the number of Logic 1 bit) as a two bit output. The outputs are generally called the sum and carry, with the carry having a weight of 2 and the sum of 1.
3. CSA42xe2x80x944:2 carry-save adder. A 4:2 carry-save adder is a 4-to-2 (4:2) compressor circuit that adds the result of five input bits (four regular bits and a carry-in (CIN) bit) and produces three output bits (a carry bit and a sum bit, and a carry-out (COUT) bit) for the result. The COUT bit has a weight of 2, the carry bit has a weight of 2, and the sum bit has a weight of 1.
4. CSA53xe2x80x945:3 carry-save adder. A 5:3 carry-save adder is a 5-to-3 compressor circuit that adds five input bits, three of which have bit weights of 1 and two of which have bit weights of 2. The three output bits have bit weights of 4, 2 and 1.
5. CSA73xe2x80x947:3 carry-save adder. A 7:3 carry-save adder is a 7-to-3 compressor circuit that counts seven input bits, each having a bit weight of 1. The three outputs bits have bit weights of 4, 2, and 1.
6. CPAxe2x80x94Carry-propagate adder. An adder circuit that gives the binary result of adding two binary numbers.
7. CSA43xe2x80x944:3 carry-save adder. A 4:3 carry-save adder is a 4-to-3 compressor circuit that adds four input bits and provides three outputs (S2, S1, and S0) having bit weights of 4, 2 and 1, respectively. This compressor is not efficient for general purpose multiplication, but is one of a family of compressors, introduced in the present application (along with the CSA63 and CSA84), shown to have advantages when used to count the number of Logic 1 bits on a bus.
8. CSA63xe2x80x946:3 carry-save adder. A 6:3 carry-save adder is a 6-to-3 compressor circuit that adds six equally weighted input bits and produces three output bits with weights of 4, 2, and, 1, respectively.
9. CSA84xe2x80x948:4 carry-save adder. An 8:4 carry-save adder is an 8-to-4 compressor circuit with adds eight equally weighted input bits. The output bits have weights of 8, 4, 2 and 1, respectively.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide a circuit for determining if an N-bit number is equal to a power of two. According to an advantageous embodiment of the present invention, the circuit comprises: 1) a first stage of detection gates, each of the first stage detection gates capable of receiving a first data bit and a second data bit from the N-bit number and generating a first output bit and a second output bit, wherein the first and second output bits are 01 if the first and second data bits are different and are one of 00 and 11 if the first and second data bits are the same; and 2) a second stage of detection gates coupled to the outputs of the first stage of detection gates, each of the second stage detection gates receiving three of the first stage output bits and generating a first output bit and a second output bit, wherein the first and second output bits of the second stage detection gates are 01 if only one of the three first stage output bits is equal to Logic 1 and are one of 00 and 11 otherwise.
According to one embodiment of the present invention, each of the detection gates in the first stage of detection gates comprises a first multiplexer and a second multiplexer.
According to another embodiment of the present invention, the first multiplexer has a 0 input channel coupled to the first data bit, a 1 input channel coupled to a Logic 1 signal, and a channel select input coupled to the second data bit.
According to still another embodiment of the present invention, the second multiplexer has a 0 input channel coupled to a Logic 0 signal, a 1 input channel coupled to the first data bit, and a channel select input coupled to the second data bit.
According to yet another embodiment of the present invention, each of the detection gates in the second stage of detection gates comprises a first multiplexer and a second multiplexer.
According to a further embodiment of the present invention, the first multiplexer has a 0 input channel coupled to a first output bit of the first stage, a 1 input channel coupled to a Logic 1 signal, and a channel select input coupled to a second output bit of the first stage.
According to a still further embodiment of the present invention, the second multiplexer has a 0 input channel coupled to a third output bit of the first stage, a 1 input channel coupled to the first output bit of the first stage, and a channel select input coupled to the second output bit of the first stage.
According to a yet further embodiment of the present invention, each of the detection gates in the second stage of detection gates further comprises a third multiplexer and a fourth multiplexer.
In another embodiment of the present invention, the third multiplexer has a 0 input channel coupled to an output of the first multiplexer, a 1 input channel coupled to a Logic 1 signal, and a channel select input coupled to a fourth output bit of the first stage.
In still another embodiment of the present invention, the fourth multiplexer has a 0 input channel coupled to an output of the second multiplexer, a 1 input channel coupled to the output of the first multiplexer, and a channel select input coupled to the fourth output bit of the first stage.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.