The present invention relates generally to the field of digital electronics and in particular to a circuit and method for subdividing a CAMRAM bank via a virtual ground.
Microprocessors perform computational tasks in a wide variety of applications, including embedded applications such as portable electronic devices. The ever-increasing feature set and enhanced functionality of such devices requires ever more computationally powerful processors, to provide additional functionality via software. Another trend of portable electronic devices is an ever-shrinking form factor. A major impact of this trend is the decreasing size of batteries used to power the processor and other electronics in the device, making power efficiency an increasingly important design consideration. Hence, processor improvements that increase execution speed and reduce power consumption are desirable for portable electronic device processors in particular as well as processors in general.
Most modern processors capitalize on the spatial and temporal locality properties of most programs by storing recently executed instructions and recently accessed data in one or more cache memories for ready access by an instruction execution pipeline. A cache is a high-speed, usually on-chip, memory structure comprising a Content Addressable Memory (CAM) and corresponding Random Access Memory (RAM), known as a CAMRAM. The instructions or data reside in a cache “line” stored in the RAM. To determine whether a particular datum resides in the RAM, a portion of its address is applied to the CAM.
A CAM is a particular memory structure wherein an applied compare input (referred to herein as the key) is simultaneously compared to data stored in each CAM entry (referred to herein as a key field), and the output of the CAM is an indication of which, if any, key field matches the key. In a cache, the key and key fields are portions of (virtual or physical) addresses, and if a match occurs (i.e., the access “hits” in the cache), the location of the match indexes the RAM, and the corresponding cache line is accessed.
The CAMRAM circuit may also be employed in a Translation Lookaside Buffer (TLB) for fast address translation. In this application, an applied virtual address is the key, previously translated virtual addresses are stored as key fields in the CAM, and associated RAM locations store corresponding physical addresses. CAMRAMs may also be deployed in other applications, such as a memory board that queues write requests. In this case, the address of a read request may be a key, searching against queued write addresses. A hit indicates write data more recent than that stored in the memory, which must be used to service the read request to ensure coherency. In general, CAMRAMs are useful in a variety of applications.
FIG. 1 depicts a functional block diagram of a portion of one entry of a CAM structure, indicated generally by the numeral 100. The CAM entry j includes a match line 102 that spans all bit positions of the jth key field 110. The match line 102 is pulled high by a PRECHARGE signal turning on the gate of a pass transistor 104 connecting the match line 102 to power. At each bit of the jth CAM entry, a discharge circuit 105 may selectively discharge the match line 102. FIG. 1 depicts a functional block diagram of the discharge circuit 105, including a switching circuit 106 such as a pass transistor interposed between the match line 102 and circuit ground. The gate of the discharge transistor 106 is the logical XOR 108 of a key bit 112 and the corresponding key field bit 110. At each ith bit position, if the key bit 112 and the key field bit 110 match, the output of the XOR gate 108 is low and the transistor 106 does not conduct charge from the match line 102 to ground. If the key bit 112 and the key field bit 110 mismatch, the output of the XOR gate 108 is high, turning on the transistor 106 and pulling the match line 102 low.
In this manner, if any bit of the key 112 mismatches with any corresponding bit of the key field 110, the match line 102 is pulled low. Conversely, only if every bit of the key 112 and the key field 110 match is no path to ground established, and the match line 102 remains high. A sense circuit 114 detects the level of the jth match line 102 at a time determined by the worst-case match line 102 discharge time. If each key field 110 is unique, which is the case in normal cache and TLB operation, then only one key field 110 should match the key 112. In that case, only one match line 102 within the CAM will remain high. To ensure this is the case, the output of each match line sense circuit 114 goes to a collision detection circuit 116, which detects multiple matches, and generates an error if they occur.
A high performance processor may include large cache memories, for example, having 512 entries or more. Comparing a key 112 to all 512 entries presents several problems. Capacitive loading due to large fan-out, such as in distributing the key bits 112 to all CAM entries 100 reduces the speed of operation. Additionally, precharging and discharging at least 511 match lines 102 for each access consumes excessive power. To address these concerns, the CAMRAM of a large cache may be divided into banks, as shown in FIG. 2 (depicting four banks, although any number of banks may be implemented in any given application).
A CAMRAM 120 comprises a plurality of CAM banks 122, and a corresponding plurality of RAM banks 124. In the case of a cache, the banks may be selected by decoding predetermined address bits. Each CAM bank comprises a set of CAM driver circuits 126 that buffer and distribute signals to the CAM entries 100 within the CAM bank 122. The CAM driver circuits 126 may include “overhead” circuits such as clock drivers, write drivers and control signals for the key field memory cells 110, sense amps and buffers for reading the key field memory cells 110, and the like (not shown). One component of the CAM driver circuits 126, depicted in FIG. 2, are key drivers 127 for distributing the key bits 112 to CAM entries 100 within each CAM bank 122. In this example, the key drivers 127 comprise AND gates that gate the key bits 112 with a CAM clock signal.
A CAM bank 122 may include, for example, 64 CAM entries 100. In general, higher performance and lower power consumption may be achieved by reducing the number of CAM entries 100 per CAM block 122. However, this requires a larger number of CAM banks 122, replicating the CAM driver circuits 126, which wastes silicon area. Thus, a means for functionally subdividing a CAM bank 122 to activate fewer CAM entries 100 at a time, while not replicating the CAM driver circuits 126, would be advantageous.