Modern processors often address memory systems using effective addresses formed by adding an offset to a base address (i.e., using base plus offset addressing). For example, base plus offset addressing is typically used in load instructions and relative branch instructions. In general, adding two numbers (e.g., addresses) expressed as binary digits (bits) involves adding bits in corresponding ordered bit positions. A carry signal is generated for each added pair of bits, and the carry signals are provided as inputs to higher ordered bit pair additions. This carry signal propagation is relatively slow, and ultimately determines the total amount of time required to add the two numbers.
In general, sum-addressed memories (SAMs) store data accessed dependent upon a sum of two input numbers (e.g., addresses). Many SAMs do not actually add the two input numbers, thus avoiding carry propagation and the associated latency. Instead, these SAMs detect when the sum of the two input numbers is equal to another number; that is, when (A+B=K), where the two input numbers are A and B. In general, an (A+B=K) equality test can be performed in less time than it would otherwise take to add the two numbers A and B.
A typical SAM cache memory includes a sum decoder used to generate signals to select a particular word line within one or more data arrays. FIG. 1 is a diagram of a conventional sum decoder 100 used to generate signals to select a particular word line within one of two data arrays (e.g., an even data array and an odd data array) of a sum-addressed memory (SAM) cache dependent upon an arithmetic sum of a 14-bit base address A[13:0] and a 14-bit offset address B[13:0]. The conventional sum decoder 100 uses the above described (A+B=K) equality test to eliminate the time-consuming carry propagation typical of conventional addition circuits.
The conventional sum decoder 100 includes four 2-bit sum predecoders 102A-102D, a low order carry generator 104, and even/odd select logic 106. Each of the 2-bit sum predecoders 102 actually receives 3 base address bits and 3 offset address bits, and produces 8 predecoded signals for final decode. The sum decoder 100 of FIG. 1 produces 4 groups of 8-bit predecoded signals. The 4 groups of 8-bit predecoded signals may be, for example, provided to a final decode block. Logic (e.g., NAND logic) within the final decode block may use the 4 groups of 8-bit predecoded signals to produce signals that select (i.e., activate) one of 512 word lines of both the even data array and the odd data.
The low order carry generator 104 receives the 5 lowest-ordered base and offset address bits, and uses the 5 lowest-ordered base and offset address bits to generate a carry signal. The even/odd select logic 106 receives base address bit A[5], offset address bit B[5], and the carry signal, and uses the base address bit A[5], offset address bit B[5], and the carry signal to generate an even/odd select signal. The even/odd select signal may be used to select between the data outputs produced by the even data array and the odd data array.
A problem arises in that the circuitry of the conventional sum decoder 100 is relatively complex and typically includes multiple sequential stages (i.e., cascaded levels) of exclusive-OR (XOR) logic functions. Typical implementations of XOR logic functions tend to be relatively slow compared to other types of logic functions (e.g., AND, OR, NAND, NOR, and the like). To increase circuit speed, XOR logic functions may be implemented using dynamic logic, or relatively complex static logic (i.e., combinations of faster and simpler logic gates). In general, however, such increases in circuit speed are usually achieved at the cost of increased electrical power dissipation.