This invention relates in general to computer systems, and in specific to an arrangement for a carry look-ahead for bi-endian adder.
Computer systems may employ a multi-level hierarchy of memory, with relatively fast, expensive but limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost but higher-capacity memory at the lowest level of the hierarchy. The hierarchy may include a small fast memory called a cache, either physically integrated within a processor or mounted physically close to the processor for speed. The computer system may employ separate instruction caches and data caches. In addition, the computer system may use multiple levels of caches. The use of a cache is generally transparent to a computer program at the instruction level and can thus be added to a computer architecture without changing the instruction set or requiring modification to existing programs.
Computer processors typically include cache for storing data. When executing an instruction that requires access to memory (e.g., read from or write to memory), a processor typically accesses cache in an attempt to satisfy the instruction. Of course, it is desirable to have the cache implemented in a manner that allows the processor to access the cache in an efficient manner. That is, it is desirable to have the cache implemented in a manner such that the processor is capable of accessing the cache (i.e., reading from or writing to the cache) quickly so that the processor may be capable of executing instructions quickly. Caches have been configured in both on chip and off-chip arrangements. On-processor-chip caches have less latency, since they are closer to the processor, but since on-chip area is expensive, such caches are typically smaller than off-chip caches. Off-processor-chip caches have longer latencies since they are remotely located from the processor, but such caches are typically larger than on-chip caches.
Cache systems typically perform addition as part of their memory operations. Binary addition is similar to numerical addition. The most basic form of binary addition entails starting with the least significant digit, and adding the two numbers, and moving a carry, if any, into the next significant digit addition. For example, consider the addition of the bit stream 010 and a bit stream 111. The addition of the least significant bits is 0+1, with a sum of 1, and a carry out of 0. The addition of the next significant bits are 1+1 plus a carry in of 0, with a sum of 0, and a carry out 1. The addition of the next significant bits are 0+1 plus a carry in of 1, with a sum of 0, and a carry out 1. Thus, the addition yields 001 plus a carry out of 1, or 1001.
The sequential addition described above works well for small bit streams, e.g. 3 bits, but becomes inefficient for large bit streams, e.g. 64 bits. Thus, the prior art uses carry select addition, which is similar to sequential addition, but breaks the bit streams into smaller blocks and performs two calculations, a first assuming that the carry bit is a zero, the second assumes the carry bit is a one. For example, consider a bit stream of 100101 which is added to bit stream 110001, this would yield 1010110 using sequential addition. With carry select addition these streams would be split into blocks 100 and 101 and 110 and 001, respectively. The addition of the blocks are 101+001 and 100+110. Now 100+110 would be calculated in two ways, the first assumes a carry in of 0 and the second assumes a carry in of 1. Thus, 100+110+0=1010, and 100+110+1=1011. The addition of 101+001=110 with a carry out of 0, thus the carry in of 0 calculation for the 100+110 addition should be used. The two calculations are then concatenated together to form (1010) (110)=1010110. Note that the additions of the two segments can be performed in parallel. Further note that a 2 to 1 multiplexer (MUX) is typically used to select between the carry 0 and carry 1 calculations.
The addition of two binary bits have four possible scenarios for the two operands, i.e. 00, 01, 10, and 11. The behavior of the operands is characterized in terms of the carry out of the operands. The behavior can be characterized as propagate, kill, or generate (PKG). Thus the four possible combinations are encoded as one-of-three encoding, or PKG encoding. Thus, the carry out of two operand bits can be represented by PKG encoding. If the two operand bits tend to propagate a carry, i.e. the carry out equals the carry in, then these operands are considered P operands. The P operands are 01 and 10. Thus, if a carry is 1, then (0+1)+1 equals 0 with a carry of 1. Similarly, if a carry is 0, then (0+1)+0 equals 1 with a carry of 0. The operands of (1+0) behave similarly. If the two operand bits tend to kill a carry, i.e. the carry out always equals 0, then these operands are considered K operands. The K operands are 00. Thus, if a carry is 1, then (0+0)+1 equals 1 with a carry of 0. Similarly, if a carry is 0, then (0+0)+0 equals 0 with a carry of 0. If the two operand bits tend to generate a carry, i.e. the carry out always equals 1, then these operands are considered G operands. The G operands are 11. Thus, if a carry is 1, then (1+1)+1 equals 1 with a carry of 1. Similarly, if a carry is 0, then (1+1)+0 equals 0 with a carry of 1. Accordingly, the carry out of two operand bits can be represented by the three signals P, K and G.
Most caches use binary addition as part of their semaphore operations. Semaphore operations are atomic read/modify/write operations, meaning that the operations cannot be snooped or otherwise interrupted by normal interrupt operations. For example, a fetch-add instruction is an instruction that software uses to perform an atomic memory read plus an ALU operation plus a memory write. This causes a value to read from memory, a constant to be added to the value, and then the new value written back to memory. Software uses the semaphores for communication mechanisms. Semaphores are basically flags indicating who has the right to modify particular resources in a system. The atomic nature prevents multiple processes from having rights to a single resource.
These prior art cache mechanisms work well for most caches. However, they do not operate properly for bi-endian caches. These are caches that have operations with data in either big-endian format and/or little endian format. For example, the number 2 may be represented as the four bit number 0010 in little endian (LE) format, or 0100 in big endian (BE) format.
These and other objects, features and technical advantages are achieved by a system and method which is able to perform carry look-ahead calculations for a bi-endian adder in a cache memory system.
The inventive adder operates for big endian or little endian formatted data. The adder can add one of +/xe2x88x921, 4, 8, or 16 to a loaded value from memory. The add operation can be a 4 or 8 byte add operation. The resulting value can then be stored back into memory. The adder may operate in conjunction with an atomic fetch-add instruction.
The inventive adder comprises eight byte adder cells and carry look-ahead (CLA) logic. The adder cells determine which of themselves is the least significant bit (LSB) byte adder cell. The LSB cell then adds one of the increment values, +/xe2x88x921, 4, 8, or 16, to a loaded value from memory. The other cells add 0x00 or 0xFF, depending upon the sign of the increment value, to a loaded value from memory. Each adder performs two adds, one for a carry-in of 0, and the other for a carry in of 1. Both results are sent to a MUX. The CLA logic determines each of the carries, and provides a selection control signal to each MUX. of the different cells.
The CLA operates according to predetermined equations that take into account, the endianess of the operation, the size (4 byte or 8 byte) of the operation. The CLA logic determines whether an addition has resulted in the generation of a carry in a prior block, and whether the carry is propagated by subsequent byte blocks (if present) into the current block. Also, the negative increment values are achieved by using 1""s complement and then adding a 1 through the CLA logic.
It is a technical advantage of the invention to have an adder that operates with both big endian format data and little endian format data.
It is another technical advantage of the invention to be able to add one of the increment values +/xe2x88x921, 4, 8, or 16 to a loaded value from memory.
It is a further technical advantage of the invention to use logic to form the negative increment values from the 1""s complement of the positive values, and to then add one via the CLA logic to form the 2""s complement or the negative value.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.