The present invention is directed, in general, to data processors and, more specifically, to a circuit that counts the number of Logic 1 bits on a bus in a data processor.
The demand for high performance computers and communication devices requires that state-of-the-art digital signal processors (DSPs) and general purpose microprocessors, such as xc3x9786 based microprocessors, execute instructions in the minimum amount of time. A number of different approaches have been taken to decrease instruction execution time, thereby increasing processor throughput. One way to increase processor throughput is to use a pipeline architecture in which the processor is divided into separate processing stages that form the pipeline. Instructions are broken down into elemental steps that are executed in different stages in an assembly line fashion.
Pipelining refers to the simultaneous processing of multiple instructions in the pipeline. For example, if a processor executes each instruction in five stages and each stage requires a single clock cycle to perform its function, then five separate instructions can be processed simultaneously in the pipeline, with the processing of one instruction completed during each clock cycle. Hence, the instruction throughput of an N stage pipelined architecture is, in theory, N times greater than the throughput of a non-pipelined architecture that completes only one instruction every N clock cycles. However, the speed improvements provided by pipeline architectures and superpipelining processing are ultimately limited by speed at which the individual stages in the pipeline execute. It is therefore important to minimize the time required to execute each part of an instruction.
Mathematical operations often incur substantial time delays in calculating a value. Counting the number of Logic 1 bits on a data bus or in a data register is a common operation encountered in computer instruction sets (e.g., ST20C2 Core Instruction Set Reference Manual, SGS-Thomson Microelectronics, November 1997) and as a component function in various digital blocks, such as memory interface units (e.g., N. J. Richardson, Private Communication). The function can serve a number of different purposes, including determining the number of valid bits set in some control logic and performing a simple error detection operation. The input to such a function is an n-bit wide bus (or the output of an n-bit data register) in which an arbitrary number of bits are set to a Logic 1 value and the other bits are set to a Logic 0 value. The output for this function is a log2(n) bit binary number equal to the number of ones on the input bus.
The problem of counting the number of ones on a bus is a simplified analog to the compression tree in a multiplier. Writing the numbers to be added as a vertical row, it is observed that the numbers represent a single column of a multiplier. Designing large multipliers is a well-known problem in digital design (See D. Goldberg, Appendix A: Computer Arithmetic in Computer Architecturexe2x80x94A Quantitative Approach, by J. L. Hennessy and D. A. Patterson, Second Edition, Morgan Kaufmann Publishers Inc., San Francisco, Calif., 1996. See also I. Koren, Computer Arithmetic Algorithms, Prentice Hall, Englewood Cliffs, N.J., 1993).
The procedure for completing the multiplication operation involves two steps. On the first step, the partial products terms are compressed to two terms. This can be done using a number of different compression schemes, including Booth encoding and various trees of full adders, 4:2 carry-save adders (CSA42s), 5:3 carry-save adders (CSA53s), 7:3 carry-save adders (CSA73s), and the like. With two partial products, the final result of the multiplication operation is calculated using a carry-propagate adder (CPA) Again, there is a large literature on the optimum design of adders, including carry-select adders, carry look-ahead adders, and the like.
Because the problem of counting the number of Logic 1 bits on a data bus is such a common operation encountered in computer instruction sets, it is important to minimize the execution time of such an operation. However, as the bus grows larger, more stages of adders are required to perform the count and more propagation delays are encountered.
Therefore, there is a need in the art for data processors that minimize the execution time of common mathematical operations. In particular, there is a need for a circuit capable of rapidly determining the number of Logic 1 bits on a bus in a microprocessor, memory interface, or other data processing device. More particularly, there is a need for a Logic 1 bit counting circuit that minimizes the number of stages required to count Logic 1 bits on a data bus.
The present disclosure uses the following abbreviations and definitions to designate adder cells:
1. HAxe2x80x94Half adder. A half adder adds two input bits and provides the result as a two bit output, generally called sum (S) and carry (C). Carry has a weight of 2 and sum has a weight of 1.
2. CSA32xe2x80x94Full adder. A full adder that counts three input bits and provides the result (i.e., the number of Logic 1 bit) as a two bit output. The outputs are generally called the sum and carry, with the carry having a weight of 2 and the sum of 1.
3. CSA42xe2x80x944:2 carry-save adder. A 4:2 carry-save adder is a 4-to-2 (4:2) compressor circuit that adds the result of five input bits (four regular bits and a carry-in (CIN) bit) and produces three output bits (a carry bit and a sum bit, and a carryout (COUT) bit) for the result. The COUT bit has a weight of 2, the carry bit has a weight of 2, and the sum bit has a weight of 1.
4. CSA53xe2x80x945:3 carry-save adder. A 5:3 carry-save adder is a 5-to-3 compressor circuit that adds five input bits, three of which have bit weights of 1 and two of which have bit weights of 2. The three output bits have bit weights of 4, 2 and 1.
5. CSA73xe2x80x947:3 carry-save adder. A 7:3 carry-save adder is a 7-to-3 compressor circuit that counts seven input bits, each having a bit weight of 1. The three outputs bits have bit weights of 4, 2, and 1.
6. CPAxe2x80x94Carry-propagate adder. An adder circuit that gives the binary result of adding two binary numbers.
7. CSA43xe2x80x944:3 carry-save adder. A 4:3 carry-save adder is a 4-to-3 compressor circuit that adds four input bits and provides three outputs (S2, S1, and S0) having bit weights of 4, 2 and 1, respectively. This compressor is not efficient for general purpose multiplication, but is one of a family of compressors, introduced in the present application (along with the CSA63 and CSA84), shown to have advantages when used to count the number of Logic 1 bits on a bus.
8. CSA63xe2x80x946:3 carry-save adder. A 6:3 carry-save adder is a 6-to-3 compressor circuit that adds six equally weighted input bits and produces three output bits with weights of 4, 2, and, 1, respectively.
9. CSA84xe2x80x948:4 carry-save adder. An 8:4 carry-save adder is an 8-to-4 compressor circuit with adds eight equally weighted input bits. The output bits have weights of 8, 4, 2 and 1, respectively.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide a circuit for determining the number of Logic 1 bits in a group of N data bits. According to an advantageous embodiment, the circuit for determining the number of Logic 1 bits comprises: 1) an input stage of 4:3 carry-save adders, each of the 4:3 carry-save adders receiving four of the N data bits on four input lines and generating three sum bits (S2, S1, S0) equal to a total number of Logic 1 bits on the four input lines, wherein the three sum bits have bit weights of S2=4, S1=2 and S0=1, respectively; 2) a first intermediate stage of 4:2 carry-save adders, each of the first intermediate stage 4:2 carry-save adders having four input lines for receiving selected ones of the S2 sum bits, the S1 sum bits, and the S0 sum bits and generating therefrom a carry-out (COUT) bit, a carry (C) bit and a sum (S) bit; and 3) a carry-propagate adder having a first input channel and a second input channel coupled to the first intermediate stage 4:2 carry-save adders and capable of generating a binary result equal to a total number of Logic 1 bits in the group of N data bits.
According to one embodiment of the present invention, N equals 16 and the input stage comprises four 4:3 carry-save adders.
According to another embodiment of the present invention, the intermediate stage comprises three 4:2 carry-save adders.
According to still another embodiment of the present invention, N equals 32 and the input stage comprises eight 4:3 carry-save adders.
According to yet another embodiment of the present invention, the circuit for determining the number of Logic 1 bits further comprises a second intermediate stage of 4:2 carry-save adders, each of the second intermediate stage 4:2 carry-save adders having four input lines for receiving selected ones of the COUT bits, the C-bits, and the S-bits from the first intermediate stage 4:2 carry-save adders.
According to a further embodiment of the present invention, the first and second input channels of the carry-propagate adder are coupled to outputs of the second intermediate stage 4:2 carry-save adders.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.