CARRY-SAVE ADDERS
Gerrit A. Blaauw describes carry-save adders (CSAs) in section 2-12 of "Digital System Implementation" (Prentice-Hall, 1976). Blaauw indicates that the CSA was mentioned by Babbage in 1837, by von Neumann in 1947, and used in 1950 in M.I.T.'s Whirlwind computer. J. L. Hennessy and D. A. Patterson discuss carry-save adders on pages A-42 and A-43 of "Computer Architecture, A Quantitative Approach" (Morgan Kaufmann, 1990).
In "A Suggestion for a Fast Multiplier" (IEEE Transactions on Electronic Computers EC-13:14-17, 1964), C. S. Wallace, indicates that "an expedient now quite commonly used" is to add three numbers using a CSA. If a set of more than three numbers are to be added, three of the set are first added using the CSA and the carry and sum are captured. The captured carry and sum and routed back to two of the tree inputs, and another number from the set is input to the third input. (Whenever the carry-outs generated by a CSA are subsequently added in another adder, an implicit one-bit left shift of the carry-bits is implemented via the wiring between the adders.) The process is repeated until all of the numbers in the set have been added. Finally, the sum and carry are added in a "conventional" carry-propagate adder (CPA). In "Computer Arithmetic: Principles, Architecture, and Design" (John Wiley & Sons, 1979, pp. 98-100), K. Hwang describes this same technique in greater detail.
Wallace extended the use of CSAs from adding three-inputs to adding an arbitrary number of values simultaneously, while having only a single carry-propagate path. One application of the Wallace-tree (as it came to be known) is high-performance hardware multipliers. Generally, a Wallace-tree consists of successive levels of CSAs, each level reducing the number of values being added by 3:2, since each CSA takes three inputs and produces 2 outputs. At the bottom of the tree a CPA is used to add the last carry/sum pair.
X86 EFFECTIVE AND INTERMEDIATE ADDRESSES
U.S. Pat. No. 4,442,484 ('484) MICROPROCESSOR MEMORY MANAGEMENT AND PROTECTION MECHANISM, to Childs et al., issued Apr. 10, 1984, described the segmentation architecture subset of what is now known as the industry standard X86 Architecture, and is hereby incorporated by reference.
U.S. Pat. No. 4,972,338 ('338) MEMORY MANAGEMENT FOR MICROPROCESSOR SYSTEM, to Crawford et al., issued Nov. 20, 1990, described the addition of paging to the X86 Architecture, and is hereby incorporated by reference.
U.S. Pat. No. 5,204,953 ('953) ONE CLOCK ADDRESS PIPELINING IN SEGMENTATION UNIT, to Dixit, issued Apr. 20, 1993, discloses pipelined single-clock address generation for segment limit checking in the X86 architecture.
In the teachings of the X86 Architecture as taught in the foregoing cited patents, the Effective Address (EA) is calculated prior to the calculation of the relocation address (the end result of the segmentation process). The relocation address is also known more generally as the Intermediate Address (IA), because it is the address used as an input to the page translation process when paging is enabled. The relocation address is also known (especially in the Intel literature) as the Linear Address (LA).
The EA is an intermediate result that in the foregoing cited patents is taught as being calculated in a step prior to the calculation of the IA. The EA is used in tests of whether the segment limit has been exceeded. The EA may also be stored for potential use in future address calculations.
When IA is calculated subsequent to EA however, a performance loss results over what is possible if IA is calculated without EA as an intermediate result. Specifically, to generate EA requires a carry propagation operation. To generate IA from EA requires a subsequent carry propagation operation. If IA were calculated directly using the techniques taught by Wallace, only a single carry propagation would be required.
X86 ADDRESS SIZE AND ADDITION
In the present X86 Architecture, memory can be addressed using either 16-bit or 32-bit addresses. When 16-bit addresses are used, the Effective Address components are limited to having only 16-bits. However, the resulting Intermediate Address may exceed 16-bits, due to the carry out of the lower 16-bits. The specific address size used is determined by size specification bits in segment descriptors, instruction prefixes, and various defaults, as specified by the X86 Architecture. For example, programs that execute in real mode or virtual-8086 mode have 16-bit addresses by default.
Whereas X86 address size limitation to 16 bits indicates modulo 65536 (2 to the power 16) addition for Effective Address calculation and whereas modulo addition suggests subtraction of the modulus from trial results when such are at least equal to the modulus. Then a carry out of bit 15 of Effective Address addition represents the need to subtract a carry into bit 16 of the Intermediate Address addition.
It is common practice to organize carry propagate adders so as to expose intermediate carry terms such that the delay from such carries to sums are substantially less than from other addend inputs. This is the case in the present invention specifically with regard to carries into bit 16 which is chosen due to its equivalence to the modulus of Effective Address arithmetic.